巴西专利BR112012022744B1 audio signal decoder, audio signal encoder, method for decoding an audio signal, method for encoding

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
AUDIO SIGNAL DECODER, AUDIO SIGNAL ENCODER, METHOD FOR DECODING AN AUDIO SIGNAL, METHOD FOR ENCODING AN AUDIO SIGNAL AND COMPUTER PROGRAM USING A TIMBER DEPENDENT ADAPTATION OF A AUDIO CONTEXT CONTEXT (a decoder encoding 150 audio code) ) to provide a decoded audio signal representation (154) based on an encoded audio signal representation (152) comprising an encoded spectrum representation (ac_spectral_data []) and encoded time deformation information (tw_data []) comprises a context-based spectral value decoder (160) configured to decode a password (acod_m) that describes one or more spectral values or at least a part (m) of a numerical representation of one or more spectral values depending on a context state, to obtain decoded spectral values (162, 297, x_ac_dec []). The audio signal decoder also comprises a current context state determiner (164, c) depending on one or more previously decoded spectral values (162, 297).
公开号:BR112012022744B1
申请号:R112012022744-0
申请日:2011-03-09
公开日:2021-02-17
发明作者:Ralf Geiger；Bernd Edler；Sascha Disch；Lars Villemoes；Tom Bäckström；Stefan Bayer
申请人:Fraunhofer - Gesellschaft Zur Förderung Der Angewandten Forschung E.V.；Dolby International Ab；
IPC主号:

专利说明:

[0001] The embodiments, according to the invention, relate to an audio signal decoder to provide a representation of decoded audio signal based on a representation of encoded audio signal.
[0002] Additional embodiments according to the invention relate to an audio signal encoder to provide an encoded representation of an input audio signal.
[0003] Additional embodiments according to the invention relate to a method for providing a representation of decoded audio signal based on a representation of encoded audio signal.
[0004] Additional embodiments according to the invention relate to a method for providing an encoded representation of an input audio signal.
[0005] Additional embodiments according to the invention relate to computer programs.
[0006] Some realizations, according to the invention, refer to a concept for adapting the context of an arithmetic encoder using strain information, which can be used in combination with a time-warped modified discrete cosine transform (briefly referred to as TW-MDCT ). BACKGROUND OF THE INVENTION
[0007] In the following, a brief introduction will be given to the field of encoding audio deformed in time, whose concepts can be applied with some of the realizations of the invention.
[0008] In recent years, techniques have been developed to transform an audio signal into a frequency domain representation and to efficiently encode the frequency domain representation, for example, taking into account the limits of perceptual masking. This concept of audio signal encoding is particularly efficient if the block length, for which a set of encoded spectral coefficients is transmitted, is long and if only a comparatively small number of spectral coefficients is well above the global masking limit while a large number of spectral coefficients are close to or below the global masking limit and can therefore be ignored (or coded with minimum code length). A spectrum in which the condition is maintained is sometimes called a sparse spectrum.
[0009] For example, superimposed transforms based on cosine and sine are generally used in applications for source coding, due to their energy compression properties. That is, for harmonic tones with constant fundamental frequencies (timbre), they concentrate the signal energy to a low number of spectral components (sub-bands), which leads to an efficient signal representation.
[0010] In general, the (fundamental) timbre of a signal must be understood to be the lowest differentiable dominant frequency of the signal spectrum. In the common speech model, timbre is the frequency of the signal excitation signal modulated by the human throat. If only a single fundamental frequency is present, the spectrum would be extremely simple, comprising the fundamental frequency and overtones only. This spectrum could be coded in a highly efficient way. For signals with varying timbre; however, the energy corresponding to each harmonic component is spread over several transformation coefficients, thus leading to a reduction in the efficiency of the coding.
[0011] In order to overcome the reduction in encoding efficiency, the audio signal to be encoded is efficiently resampled in a non-uniform time frame. In subsequent processing, the sample positions obtained by the non-uniform resampling are processed as if they represent values of a uniform time structure. This operation is commonly denoted by the phrase "time deformation". The sample times can be advantageously chosen depending on the temporal variation of the timbre, so that a variation of timbre in the time warped version of the audio signal is less than a variation of timbre in the original version of the audio signal (before time deformation). After the time deformation of the audio signal, the time-deformed version of the audio signal is converted into a frequency domain. The timbre-dependent time deformation has the effect that the frequency domain representation of the time-warped audio signal typically presents an energy compaction in a much smaller number of spectral components than a frequency domain representation of the original (radio signal). audio not warped in time).
[0012] On the decoder side, the frequency domain representation of the time warped audio signal is converted to the time domain, so that a time domain representation of the time warped audio signal is available on the decoder side. However, in the time domain representation of the audio signal deformed in time reconstructed on the decoder side, the original pitch variations of the input audio signal on the encoder side are not included. Likewise, yet another time deformation by resampling the reconstructed time domain representation on the decoder side of the audio signal deformed in time is applied.
[0013] In order to obtain a good reconstruction of the input audio signal on the encoder side in the decoder, it is desirable that the time deformation on the decoder side is at least approximately the reverse operation in relation to the time deformation on the encoder side. In order to obtain an adequate time deformation, it is desirable to have information available on the decoder, which allows an adjustment of the time deformation on the decoder side.
[0014] Since it is typically necessary to transfer this information from the audio signal encoder to the audio signal decoder, it is desirable to keep the bit rate necessary for this transmission small, while maintaining a reliable reconstruction of the necessary time warping information on the decoder.
[0015] Furthermore, an encoding efficiency when encoding or decoding spectral values is sometimes increased by the use of a context-dependent encoder or a context-dependent decoder.
[0016] However, it has been found that an encoding efficiency of an audio encoder or an audio decoder is often comparatively low in the presence of a variation of a fundamental frequency or timbre, even though the concept of time warping is applied.
[0017] In view of this situation, there is a desire to have a concept that allows good coding efficiency, even in the presence of a fundamental frequency variation. SUMMARY OF THE INVENTION
[0018] One embodiment, according to the invention, creates an audio signal decoder to provide a decoded audio signal representation based on an encoded audio signal representation comprising an encoded spectrum representation and an encoded time deformation information. The audio signal decoder comprises a context-based spectral value decoder configured to decode a password that describes one or more spectral values or at least part of a numerical representation of one or more spectral values depending on a context state , to obtain decoded spectral values. The audio signal decoder also comprises a context state determiner configured to determine a current context state depending on one or more previously decoded spectral values. The audio signal decoder also comprises a time domain to time warp frequency domain converter configured to provide a time warped time domain representation of an audio structure based on a set of decoded spectral values associated with the determined audio structure and provided by the spectral value determiner based on context and depending on the time deformation information. The context state determiner is configured to adapt the context state determination to a change in a fundamental frequency between subsequent structures.
[0019] This realization, according to the invention, is based on the discovery that an encoding efficiency, which is achieved by a context-based spectral value decoder in the presence of an audio signal having a fundamental time-varying frequency, is improved if a state of context is adapted to the change of a fundamental frequency between subsequent structures, since a change of a fundamental frequency over time (which is equivalent to a variation of the timbre in many cases) has the effect that a spectrum of a given structure of audio is typically similar to a frequency scaled version of a spectrum of an earlier audio structure (prior to a given audio structure), so that adapting the context determination depending on the change in fundamental frequency allows exploring the similarity to improve coding efficiency.
[0020] In other words, it was found that the coding efficiency (or decoding efficiency) of context-based spectral value coding is comparatively poor in the presence of a significant change in a fundamental frequency between two subsequent structures, and that the efficiency of coding can be improved by adapting the determination of the state of context in that situation. The adaptation of the context state determination allows to explore similarities between the spectra of the previous audio structure and the current audio structure, while also considering the systematic differences between the spectra of the previous audio structure and the current audio structure, such as, for example, For example, the frequency scaling of the spectrum that typically appears in the presence of a fundamental frequency change over time (that is, between two audio structures).
[0021] In summary, this realization, according to the invention, helps to improve the coding efficiency without needing additional parallel information or bit rate (assuming a change that describes the change in fundamental frequency between the subsequent structures is available in any way in an audio bit stream using the time warping aspect of an audio signal encoder or decoder).
[0022] In a preferred embodiment, the time domain-to-time frequency domain converter comprises a time domain to normal frequency domain converter (without time deformation) configured to provide a time domain representation of a given audio structure based on a set of decoded spectral values associated with the given audio structure and provided by the context based spectral value decoder and a time warping resampler configured to resample the time domain representation of the given audio structure , or a processed version of it, depending on the time deformation information, to obtain a time domain representation (deformed in time) resampled from the given audio structure. This implementation of a time domain to time deformation frequency domain converter is easy to implement, as it has a "standard" frequency domain to time domain converter and comprises, as a functional extension, a deformation resampler. whose function can be independent of the function of the time domain to frequency domain converter. Likewise, the time domain to frequency domain converter can be reused both in an operating mode in which the time deformation (or not time deformation) is inactive and in an operation mode in which the time deformation (or not time deformation) is active.
[0023] In a preferred embodiment, the time warp information describes a variation of a timbre over time. In this embodiment, the context state determiner is configured to derive frequency extension information (i.e., frequency scaling information) from the time warping information.
[0024] In addition, the context state determiner is preferably configured to extend or compress a previous context associated with a previous audio structure along the frequency axis depending on the frequency extension information, to obtain a context adapted for a decoding based on in the context of one or more spectral values of a current audio structure. It has been found that time warping information, which describes a variation of a timbre over time, is well suited to derive frequency extension information. In addition, it has been found that the extension or compression of the previous context associated with a previous audio structure along the frequency axis typically results in an extended or compressed context that allows a derivation of meaningful context state information, which is well adapted to the spectrum of the present audio structure and consequently brings good coding efficiency.
[0025] In a preferred embodiment, the context state determiner is configured to derive a first average frequency information from a first audio structure from the time deformation information, and to derive a second average frequency information about a next second audio structure to the first audio structure of the time warping information. In this case, the context state determiner is configured to compute a ratio between the second average frequency information about the second audio structure and the first average frequency information about the first audio structure, in order to determine the extension information. frequency. It has been found that it is typically possible to easily derive the average frequency information from the time deformation information, and it has also been found that the ratio between the first and the second medium frequency information allows for a computationally efficient derivation of the length extension information frequency.
[0026] In another preferred embodiment, the context state determiner is configured to derive a first time-deformation contour information over the first audio structure of the time-deformation information, and to derive a second time-deformation contour information. on a second audio structure following the first audio structure of the time warping information. In this case, the context state determiner is configured to compute a ratio between the first medium-time deformation contour information on the first audio structure and the second average-time deformation contour information on the second audio structure, in order to determine the frequency extension information. It has been found that it is particularly computationally efficient to compute the averages of time deformation contour information on the first and second audio structures (which can be superimposed) and that a ratio between said first deformation contour information of mean time and said second mean time deformation contour information provides sufficiently accurate frequency extension information.
[0027] In a preferred embodiment, the context state determiner is configured to derive the first and second average frequency information or the first and second average time strain contour information from a common time strain contour that extends to the over a plurality of consecutive audio structures. It has been found that the concept of establishing a common time-deformation contour that extends across a plurality of consecutive audio structures not only facilitates accurate and distortion-free computation of the resampling moment, but also provides a very good for estimating a change in fundamental frequency between two subsequent audio structures. Likewise, the common time warp contour has been identified as a very good way to identify a change in relative frequency over time between different audio structures.
[0028] In a preferred embodiment, the audio signal decoder comprises a time warp contour calculator configured to calculate time warp contour information that describes a time course of a relative timbre over a plurality of consecutive audio structures based on time deformation information. In that case, the context state determiner is configured to use the time deformation contour information to derive the frequency extension information. It has been found that time deformation contour information that can, for example, be defined for each sample of an audio structure, provides a very good basis for adapting the context state determination.
[0029] In a preferred embodiment, the audio signal decoder comprises a resampling position calculator. The resampling position calculator is configured to calculate resampling positions for use by the time deformation resampler based on the time deformation contour information, so that a temporal variation of the resampling positions is determined by the deformation contour information. of time. It has been found that the common use of time-deformation contour information for determining frequency extension information and for determining resampling positions has the effect that an extended context, which is obtained by applying the extension information of frequency frequency, it is well adapted to the spectrum characteristics of a current audio structure, where the audio signal of the current audio structure is at least approximately a continuation of the audio signal of the previous audio signal, reconstructed by the resampling operation , using the calculated resampling positions.
[0030] In a preferred embodiment, the context state determiner is configured to derive a current numerical context value depending on a plurality of previously decoded spectral values (which can be included or described in a context memory structure), and to select a mapping standard that describes a mapping of a code value to a symbol code that represents one or more spectral values or a numerical representation part of one or more spectral values, depending on the current numerical context value. In this case, the context-based spectral value decoder is configured to decode the code value that describes one or more spectral values, or at least part of a numerical representation of one or more spectral values, using the selected mapping standard by the context state determiner. It was found that a context adaptation, in which a current numerical context value is derived from a plurality of previously decoded spectral values, and in which a mapping standard is selected according to said (current) numerical context value, it benefits significantly from an adaptation of the determination of the context state, for example, of the numerical value of context (current), since the selection of a significantly inadequate mapping standard can be avoided by using this concept. Conversely, if the derivation of the context state, that is, of the current numerical context value, were not adapted depending on the change in fundamental frequency between subsequent structures, a wrong selection of a mapping standard would generally occur in the presence of a change the fundamental frequency, so that a coding gain is reduced. This reduction in the coding gain is avoided by the described mechanism.
[0031] In a preferred embodiment, the context state determiner is configured to adjust and update a preliminary context memory structure, so that the inputs of the preliminary context memory structure describe one or more spectral values of a first audio structure, where the input indices of the inputs of the preliminary context memory structure are indicative of a frequency box or a set of adjacent frequency boxes of the time domain to frequency domain converter to which the respective inputs are associated (for example, example, in a provision of a time domain representation of the first audio structure). The context state determiner is further configured to obtain a context memory structure scaled by frequency based on the preliminary context memory structure so that a particular entry or subentry of the preliminary context memory structure having a first frequency index be mapped to a corresponding entry or subentry of the frequency-scaled context memory structure having a second frequency index. The second frequency index is associated with a different box or set of adjacent frequency boxes different from the time domain to frequency domain converter of the first frequency index.
[0032] In other words, an entry of the preliminary context memory structure, which is obtained based on one or more spectral values that correspond to an i-th spectral box of the time domain to frequency domain converter (or the i-th set of spectral boxes of the time domain to frequency domain converter is mapped into a frequency-scaled context memory entry that is associated with a j-th frequency box (or j-th set of frequency boxes ) of the time domain to frequency domain converter, where j is different from i. This concept of mapping preliminary context memory structure inputs into frequency scaled context memory structure inputs has been found to provide a particularly efficient method of computationally adapting the context state determination to a fundamental frequency change . A scaling of frequency of the context can be achieved with low effort, using this concept. Likewise, the derivation of the current context numeric value from the frequency-scaled context memory structure may be identical to a derivation of the current context numeric value from a conventional (for example, preliminary) memory structure in the absence of a significant pitch variation. Thus, the described concept allows the implementation of the context adaptation in an existing audio decoder with minimal effort.
[0033] In a preferred embodiment, the context state determiner is configured to derive a context state value that describes the current context state for a password decoding that describes one or more spectral values of a second audio structure or at least a part of a numerical representation def one or more spectral values of a second audio structure having associated a third frequency index using values of the context memory structure scaled by frequency, the frequency indices of those values of the scaled context memory structure by frequency are in a predetermined relationship with the third frequency index. In this case, the third frequency index designates a frequency box or set of frequency boxes adjacent to the frequency domain decoder for time domain to which one or more spectral values of the audio structure to be decoded using the state value. of current context are associated.
[0034] It has been found that the use of a predetermined (and preferably fixed) relative environment (in terms of frequency boxes) of one or more spectral values to be decoded to derive the context state value (for example, a numeric value current context) allows keeping the computation of said context state value reasonably simple. When using the frequency-scaled context memory structure as an input to the derivation of the context state value, a variation of the fundamental frequency can be considered efficiently.
[0035] In a preferred embodiment, the context state determiner is configured to adjust each of a plurality of frequency-scaled context memory structure inputs having a target frequency index corresponding to a value of a corresponding input of the memory structure of frequency. preliminary context having a corresponding source frequency index. The context state determiner is configured to determine corresponding frequency indices of a frequency-scaled context memory structure input and a corresponding preliminary context memory structure input, so that a ratio between said frequency indexes corresponding values is determined by changing the fundamental frequency between the current audio structure, to which the inputs of the preliminary context memory structure are associated and a subsequent audio structure, whose decoding context is determined by the inputs of the scaled context memory structure by frequency. By using this concept to derive inputs from the context memory structure scaled by frequency, the complexity can be kept small, while it is still possible to adapt the context memory structure scaled by frequency to the fundamental frequency change.
[0036] In a preferred embodiment, the context state determiner is configured to adjust the preliminary context memory structure, so that each of a plurality of preliminary context memory structure entries is based on a plurality of spectral values of a first audio structure, in which the input indices of the preliminary context memory structure inputs are indicative of a set of adjacent frequency boxes of the time domain to frequency domain converter to which the respective inputs are associated (in relation to to the first audio structure). The context state determiner is configured to extract individual context values per preliminary frequency boxes having individual frequency box indices associated with the inputs of the preliminary context memory structure. In addition, the context state determiner is configured to obtain individual context values per frequency box scaled by frequency having associated individual frequency box indices, so that a given individual context value per preliminary frequency box having a first frequency box index is mapped to an individual context value per frequency box scaled by corresponding frequency having the second frequency box index, so that an individual frequency box mapping of the individual context values per preliminary frequency box be obtained. The context state determiner is further configured to combine a plurality of individual context values per frequency box scaled by frequency into a combined input of the frequency scaled context memory structure. Likewise, it is possible to adapt the context memory structure scaled by frequency to a fundamental frequency change in a way with very refined rights, even if a plurality of frequency boxes are summarized in a single entry of the context memory structure. . Thus, a particularly precise adaptation of the context for changing the fundamental frequency can be achieved.
[0037] Another embodiment, according to the invention, creates an audio signal encoder to provide an encoded representation of an input audio signal comprising an encoded spectrum representation and an encoded time warp information. The audio signal encoder comprises a frequency domain representation provider configured to provide a frequency domain representation that represents a time-warped version of the input audio signal, time-warped according to a time warp information . The audio signal encoder further comprises a context-based spectral value encoder configured to encode a password that describes one or more spectral values of the frequency domain representation, or at least part of a numerical representation of one or more values spectra of the frequency domain representation, depending on a context state, to obtain encoded spectral values of the encoded spectral representation. The audio signal decoder also comprises a context state determiner configured to determine a current context state depending on the one or more previously encoded spectral values. The context state determiner is configured to adapt the context determination to a change in a fundamental frequency between subsequent structures.
[0038] This audio signal encoder is based on the same ideas and findings as the audio signal decoder described above. Also, the audio signal encoder can be supplemented by any of the aspects and features discussed in relation to the audio signal decoder, in which the previously encoded spectral values play the role of the previously decoded spectral values in the context state calculation.
[0039] In a preferred embodiment, the context state determiner is configured to derive a current numerical context value depending on a plurality of previously coded spectral values, and to select a mapping standard that describes a mapping of one or more spectral values, or a part of a numerical representation of one or more spectral values, in a code value depending on the current context numeric value. In this case, the context-based spectral value encoder is configured to provide the code value that describes one or more spectral values or at least part of a numerical representation of one or more spectral values using the mapping standard selected by the determiner of context state.
[0040] Another embodiment, according to the invention, creates a method for providing a representation of decoded audio signal based on a representation of encoded audio signal.
[0041] Another embodiment, according to the invention, creates a method for providing an encoded representation of an input audio signal.
[0042] Another embodiment, according to the invention, creates a computer program to perform one of said methods.
[0043] The methods and the computer program are based on the same considerations as those of the audio signal decoder and the audio signal encoder discussed above.
[0044] In addition, the audio signal encoder, methods and computer programs can be supplemented by any of the aspects and features discussed above and described below in relation to the audio signal decoder. BRIEF DESCRIPTION OF THE FIGURES
[0045] The realizations according to the present invention will subsequently be described, with reference to the attached figures, in which: Figure 1a shows a schematic block diagram of an audio signal encoder, according to an embodiment of the invention; Figure 1b shows a schematic block diagram of an audio signal decoder, according to an embodiment of the invention; Figure 2a shows a schematic block diagram of an audio signal encoder, according to another embodiment of the invention; Figure 2b shows a schematic block diagram of an audio signal decoder, according to another embodiment of the invention; Figure 2c shows a schematic block diagram of an arithmetic encoder for use in the audio encoder, according to the embodiments of the invention; Figure 2d shows a schematic block diagram of an arithmetic decoder for use in the audio signal decoder, according to the embodiments of the invention; Figure 3a presents a graphical representation of an adaptive arithmetic encoding of context (encoding / decoding); Figure 3b shows a graphic representation of the relative timbre contours; Figure 3c presents a graphical representation of an extension effect of the time-modified discrete cosine transform (TW-MDCT); Figure 4a shows a schematic block diagram of a context state determiner for use in audio signal encoders and audio signal decoders, in accordance with the embodiments of the present invention; Figure 4b shows a graphical representation of a frequency compression of the context, which can be performed by the context state determiner, according to Figure 4a; Figure 4c shows a pseudoprogram code representation of an algorithm for extending or compressing a context, which can be applied in the embodiments, according to the invention; Figures 4d and 4e show a pseudoprogram code representation of an algorithm for extending or compressing a context, which can be used in the embodiments, according to the invention; Figures 5a, 5b show a detailed extract of a schematic block diagram of an audio signal decoder, according to an embodiment of the invention; Figures 6a, 6b present a detailed extract of a flowchart of a mapper to provide a representation of a decoded audio signal, according to an embodiment of the invention; Figure 7a shows a legend for definitions of data elements and help elements, which are used in an audio decoder, according to an embodiment of the invention; Figure 7b shows a legend of constant definitions, which are used in an audio decoder, according to an embodiment of the invention; Figure 8 presents a table representation of a mapping of a password index to a corresponding decoded time strain value; Figure 9 presents a pseudoprogram code representation of an algorithm to linearly interpolate between equally spaced deformation nodes; Figure 10a shows a pseudoprogram code representation of a helper function "warp_time_inv"; Figure 10b shows a pseudoprogram code representation of a helper function "warp_inv_vec"; Figure 11 presents a pseudoprogram code representation of an algorithm for computing a sample position vector and a transition length; Figure 12 shows a table representation of values of a synthesis window length N depending on a sequence of windows and a central encoder frame length; Figure 13 shows a matrix representation of allowed window sequences; Figure 14 presents a pseudoprogram code representation of an algorithm for windowing and for an internal overlay-addition of a sequence of windows of the type "EIGHT_SHORT_SEQUENCE"; Figure 15 presents a pseudoprogram code representation of an algorithm for windowing and the overlap and internal addition of other window sequences, which are not of the "EIGHT_SHORT_SEQUENCE"type; Figure 16 shows a pseudoprogram code representation of an algorithm for resampling; and Figure 17 presents a graphical representation of a context for state calculation, which can be used in some embodiments, according to the invention; Figure 18 shows a definition legend; Figure 19 shows a pseudoprogram code representation of an "arith_map_context ()"algorithm; Figure 20 presents a pseudoprogram code representation of an "arith_get_contexto () algorithm Figure 21 presents a pseudoprogram code representation of an "arith_get_pk ()"algorithm; Figure 22 presents a pseudoprogram code representation of an "arith_decode () algorithm. Figure 23 shows a pseudoprogram code representation of an algorithm for decoding one or more less significant bit planes; Figure 24 shows a pseudoprogram code representation of an algorithm to adjust inputs for a set of arithmetically decoded spectral values; Figure 25 presents a pseudoprogram code representation of an "arith_update_contexto ()"function; Figure 26 presents a pseudoprogram code representation of an "arith_finish ()"algorithm; Figures 27a-27f show representations of syntax elements of the audio stream, according to an embodiment of the invention. DETAILED DESCRIPTION OF ACHIEVEMENTS 1 · AUDIO SIGNAL ENCODER ACCORDING TO FIGURE la
[0046] Figure 1a shows a schematic block diagram of an audio signal encoder 100, according to an embodiment of the invention.
[0047] The audio signal encoder 100 is configured to receive an incoming audio signal 110 and to provide an encoded representation 112 of the incoming audio signal. The encoded representation 112 of the input audio signal comprises an encoded spectrum representation and an encoded time warp information.
[0048] The audio signal encoder 100 comprises a frequency domain representation provider 120 that is configured to receive the incoming audio signal 110 and a time warp information 122. The frequency domain representation provider 120 (which can considered to be a time warp frequency domain representation provider) is configured to provide a frequency domain representation 124 that represents a time warped version of the input audio signal 110, time warped, according to the information time warp 122. The audio signal encoder 100 also comprises a context-based spectral value encoder 130 configured to provide a password 132 that describes one or more spectral values of frequency domain representation 124, or at least one part of a numerical representation of one or more spectral values of the frequency domain representation 124, depending on one and context, to obtain encoded spectral values of the encoded spectral representation. The context state can, for example, be described by a context state information 134. The audio signal encoder 100 also comprises the context state determiner 140 which is configured to determine a current context state depending on a or more previously encoded spectral values 124. Context status determiner 140 can therefore provide context status information 134 to context-based spectral value encoder 130, where context status information can, for example, take the form of a current numerical context value (for selecting a mapping standard or mapping table) or a reference to a selected mapping standard or mapping table. The context state determiner 140 is configured to adapt the context state determination to a change in a fundamental frequency between subsequent structures. Likewise, the context state determiner can evaluate information about a change in a fundamental frequency between subsequent audio structures. This information about the fundamental frequency change between subsequent structures can, for example, be based on the time warp information 122, which is used by the frequency domain representation provider 120.
[0049] Likewise, the audio signal encoder can provide a particularly high coding efficiency in the case that parts of the audio signal comprise a fundamental frequency that varies over time or a timbre that varies over time, as the derivation of the context status information 134 is adapted to the variation of the fundamental frequency between two audio structures. Likewise, the context, which is used by the context-based spectral value encoder 130, is well suited to spectral compression (in relation to frequency) or spectral expansion (in relation to frequency) of the frequency domain representation 124, that occur if the fundamental frequency changes from one audio structure to the next audio structure (that is, between the two audio structures). Consequently, the context state information 134 is well adapted, on average, to the frequency domain representation 124 even in the case of a fundamental frequency change which, in turn, results in a good coding efficiency of the spectral value encoder. based on context. It turned out that if, on the contrary, the context state would not be adapted to the change in the fundamental frequency, the context would be inadequate in situations in which the fundamental frequency changes, resulting in a significant degradation of the coding efficiency.
[0050] Likewise, it can be said that the audio signal encoder 100 typically outperforms conventional audio signal encoders using a spectral value based on context coding in situations where the fundamental frequency changes.
[0051] It should be noted here that there are many different implementations of how to adapt the determination of the state of context to a change in fundamental frequency between subsequent structures (that is, from a first structure to a second subsequent structure). For example, a context memory structure, whose inputs are defined by or derived from the spectral values of the frequency domain representation 124, (or, more precisely, their content), can be extended or compressed at the frequency before a value current context numeric, which describes a context state, to be derived. These concepts will be discussed in detail below. Alternatively; however, it is also possible to change (or adapt) the algorithm to derive context state information 134 from the inputs of a context memory structure, whose inputs are based on the representation of frequency domain 124. For example, it could be adjusted which entry (entries) of this context memory structure not scaled by frequency is (are) considered (s), even if this solution is not discussed in detail here. 2. AUDIO SIGNAL DECODER ACCORDING TO FIGURE 1b
[0052] Figure 1b shows a schematic block diagram of an audio signal decoder 150.
[0053] The audio signal decoder 150 is configured to receive an encoded audio signal representation 152, which may comprise an encoded spectrum representation and an encoded time warp information. The audio signal decoder 150 is configured to provide a representation of decoded audio signal 154 based on the representation of encoded audio signal 152.
[0054] The audio signal decoder 150 comprises a context-based spectral value decoder 160, which is configured to receive passwords from the encoded spectrum representation and to provide, based on this, decoded spectral values 162. Furthermore, the value decoder context-based spectral 160 is configured to receive context state information 164 which can, for example, take the form of a current context numeric value, a selected mapping standard or a reference to a selected mapping standard . The context-based spectral value decoder 160 is configured to decode a password that describes one or more spectral values, or at least part of a numerical representation of one or more spectral values, depending on a state of context (which may be described by the context status information 164) to obtain the decoded spectral values 162. The audio signal decoder 150 also comprises a context state determiner 170 which is configured to determine a current context state depending on one or more previously decoded spectral values 162. The audio signal decoder 150 also comprises a time domain to time warp frequency domain converter 180 which is configured to provide a time warped time domain representation 182 based on a set of decoded spectral values 162 associated with a given audio structure and provided by decoding spectral value indicator based on context. The time domain to time deformation frequency domain converter 180 is configured to receive time deformation information 184 in order to adapt the provision of the time deformed time domain representation 182 to the desired time deformation described by the information encoded audio signal representation 152, so that the time-deformed time domain representation 182 constitutes the decoded audio signal representation 154 (or, equivalently, forms the basis of the signal representation decoded audio, if post-processing is used).
[0055] The time domain to time deformation frequency domain converter 180 may, for example, comprise a time domain to frequency domain converter configured to provide a time domain representation of a given audio structure based on a set. of the decoded spectral values 162 associated with a given audio structure and provided by the context-based spectral value decoder 160. The time-domain to time-deformation frequency domain converter can also comprise a time-deformation resampler configured for resample the time domain representation of the given audio structure, or a processed version of it, depending on the time strain information 184, to obtain the resampled time domain representation 182 of the given audio structure.
[0056] Furthermore, the context state determiner 170 is configured to adapt the determination of the context state (which is described by the context status information 164) to a change in a fundamental frequency between subsequent audio structures (that is, from a first audio structure for a second subsequent audio structure).
[0057] The audio signal decoder 150 is based on the findings that have already been discussed in relation to the audio signal encoder 100. In particular, the audio signal decoder is configured to adapt the determination of the context state to a change in a frequency fundamental between subsequent audio structures, so that the context state (and, consequently, the assumptions used by the context-based spectral value decoder 160 in relation to the statistical probability of the occurrence of different spectral values) is well adapted, at least on average, to the spectrum of a current audio structure to be decoded, using said context information. Likewise, the passwords encoding the spectral values of said current audio structure can be relatively short, as a good match between the selected context, selected according to the context state information provided by the context state determiner 170, and the spectral values to be decoded, generally results in comparatively short passwords, which brings with it a good bit rate efficiency.
[0058] In addition, the context state determiner 170 can be implemented efficiently, since the time deformation information 184, which is included in the representation of encoded audio signal 152 in any way for use by the time domain to domain converter time deformation frequency, can be reused by the context state determiner 170 as information about a fundamental frequency change between subsequent audio structures, or to derive information about a fundamental frequency change between subsequent audio structures.
[0059] Likewise, adapting the determination of the state of context to changing the fundamental frequency between subsequent structures does not even need any additional parallel information. Likewise, the audio signal decoder 150 brings with it enhanced coding efficiency of the spectral value based on context decoding (and allows for enhanced coding inefficiency on the encoder side 100) without requiring any parallel information, which constitutes a significant improvement in bit rate efficiency.
[0060] In addition, it should be noted that different concepts can be used to adapt the determination of the state of context to a change in the fundamental frequency between subsequent structures (that is, from a first audio structure to a second subsequent audio structure). For example, a context memory structure, whose inputs are based on the decoded spectral values 162, can be adapted, for example, using a frequency scaling (for example, a frequency extension or frequency compression) before the information from context state 164 be derived from the context memory structure scaled by frequency by context state determiner 170. Alternatively, however, a different algorithm can be used by context state determiner 170 to derive context state information 164. For example, you can adapt which entries in a context memory structure are used to determine a context state for decrypting a password having a certain password frequency index. Even though the latter concept has not been described in detail here, it certainly can be applied in some embodiments, according to the invention. Also, different concepts can be applied to determine the fundamental frequency change. 3. AUDIO SIGNAL ENCODER ACCORDING TO FIGURE 2a
[0061] Figure 2a shows a schematic block diagram of an audio signal encoder 200, according to an embodiment of the invention. It should be noted that the audio signal encoder 200, according to Figure 2, is very similar to the audio signal encoder 100, according to Figure la, so that the identical means and signals will be designated with numbers of identical and not explained in detail again.
[0062] The audio signal encoder 200 is configured to receive an incoming audio signal 110 and to provide, based on this, a representation of encoded audio signal 112. Optionally, the audio signal encoder 200 is also configured to receive an externally generated time deformation information 214.
[0063] The audio signal encoder 200 comprises a frequency domain representation provider 120, the functionality of which may be identical to the functionality of the frequency domain representation provider 120 of the audio signal encoder 100. The frequency domain representation provider 120 provides a frequency domain representation that represents a time warped version of the input audio signal 110, that frequency domain representation is designated 124. The audio signal encoder 200 also comprises a spectral value encoder based on in context 130 and a context state determiner 140, which operate as discussed in relation to the audio signal encoder 100. Likewise, the context-based spectral value encoder 130 provides passwords (for example, acod_m), each password representing one or more spectral values of the encoded spectrum representation, or at least part of a numerical representation of one or more spectral values.
[0064] The audio signal encoder optionally comprises a time warp analyzer or fundamental frequency analyzer or timbre analyzer 220, which is configured to receive the incoming audio signal 110 and to provide, based on that, contour information of time warp 222, which describes, for example, a time warp to be applied by the frequency domain representation provider 120 to the input audio signal 110, in order to compensate for a fundamental frequency change during an audio structure and / or a time evolution of a fundamental frequency of the input audio signal 110 and / or a time evolution of a timbre of the input audio signal 110. The audio signal encoder 200 also comprises a time warp contour encoder 224, which is configured to provide encoded time warp information 226 based on time warp contour information 222. The def information time coded frame 226 is preferably included in the representation of encoded audio signal 112, and can, for example, take the form of time strain ratio (encoded) values "tw_ratio [i]".
[0065] Furthermore, it should be noted that the time deformation contour information 222 can be provided to the frequency domain representation provider 120 and also to the context state determiner 140.
[0066] The audio signal encoder 200 may additionally comprise a psychoacoustic model processor 228, which is configured to receive the incoming audio signal 110, or a pre-processed version of it, and to perform a psychoacoustic analysis, to determine, for example, time-masking effects and / or frequency-masking effects. Likewise, the psychoacoustic model processor 228 can provide control information 230, which represents, for example, a psychoacoustic relevance of different frequency ranges of the incoming audio signal, as is known for audio domain encoders. frequency.
[0067] In the following, the signal path of the frequency domain representation provider 120 will be briefly described. The frequency domain representation provider 120 comprises an optional preprocessing 120a, which can optionally preprocess the input audio signal 110, to provide a preprocessed version 120b of the input audio signal 110. The frequency domain representation provider 120 also comprises a sampler / resampler configured to sample or resample the input audio signal 110, or the preprocessed version 120b of it, depending on sampling position information 120d received from a calculator sampling position 120e. Likewise, sampler / resampler 120c can apply time-varying sampling or resampling to the input audio signal 110 (or the pre-processed version 120b of it). When applying this time-varying sampling (with time distances that vary in time between efficient sample points), a sampled or resampled 120f time domain representation is obtained, in which a time variation of a timbre or fundamental frequency is reduced when compared to the input audio signal 110. The positions and sampling are calculated by calculating the sampling position 120e depending on the time deformation contour information 222. The frequency domain representation provider 120 also comprises a 120g windshield, in that the winder 120g is configured to wind the sampled or resampled time domain representation 120f provided by the sampler or resampler 120c. The windowing is carried out in order to reduce or eliminate blocking artifacts, thus allowing a smooth overlap and addition operation in an audio signal decoder. The frequency domain representation provider 120 also comprises a time domain to frequency domain converter 120i which is configured to receive the windowed and sampled / resampled time domain representation 120h and to provide, based on this, a representation of time domain. frequency domain 120j which can, for example, comprise a set of spectral coefficients per audio structure of the incoming audio signal 110 (where the audio structures of the incoming audio signal can, for example, be overlapped or not overlapped , where an overlap of approximately 50% is preferred in some embodiments for overlapping audio structures). However, it should be noted that, in some embodiments, a plurality of sets of spectral coefficients can be provided for a single audio structure.
[0068] The frequency domain representation provider 120 optionally comprises a 120k spectral processor that is configured to perform a temporal noise formation and / or a long-term forecast and / or any other form of spectral post-processing, to thereby obtain a postprocessed frequency domain representation 1201.
[0069] The frequency domain representation provider 120 optionally comprises a scaler / quantifier 120m, where the scaler / quantifier 120m can, for example, be configured to scale different frequency boxes (or frequency ranges) of the frequency domain representation 120j or postprocessed version 1201 of it, according to the control information 230 provided by the psychoacoustic model processor 228. Likewise, frequency boxes (or frequency bands, which comprise a plurality of frequency boxes) can, for example, example, be scaled according to psychoacoustic relevance, so that, in an efficient manner, frequency boxes (or frequency ranges) having high psychoacoustic relevance are encoded with high precision by a context-based spectral value encoder, while frequency boxes (or frequency bands) having low psychoacoustic relevance are coded with low precision. In addition, it should be noted that the control information 230 can optionally adjust parameters of the window, the time domain to frequency domain converter and / or the spectral post-processing. Also, control information 230 can be included, in an encoded form, in the representation of encoded audio signal 112, as is known to the skilled artisan.
[0070] Regarding the functionality of the audio signal encoder 200, it can be said that a time warp (in the sense of a time-varying sampling or resampling) is applied by the sampler / resampler 120c according to the time warp contour information. 220. Likewise, it is possible to achieve a representation of frequency domain 120j having pronounced spectral peaks and depressions, even in the presence of an input audio signal having a temporal variation in timbre, which would result in the absence of sampling / resampling variant over time, in a spotted spectrum. In addition, the derivation of the context state for use by the context-based spectral value encoder 130 is adapted depending on a change in a fundamental frequency between subsequent audio structures, which results in a particularly high coding efficiency, as discussed above. In addition, the time deformation contour information 222, which serves as a basis for both computing the sampling position for the sampler / resampler 120c and for adapting the context state determination, is encoded using a deformation contour encoder. time 224, so that a coded time warp information 226 describing the time warp contour information 222 is included in the encoded audio signal representation 112. Likewise, the encoded audio signal representation 112 provides the information necessary for efficient decoding of the encoded input audio signal 110 on the side of an audio signal decoder.
[0071] Furthermore, it should be noted that the individual components of the audio signal encoder 200 can substantially reverse the functionality of the individual components of the audio signal decoder 240, which will be described below with reference to Figure 2b. In addition, reference is also made to the detailed discussion regarding the functionality of the audio signal decoder in the entirety of this description, which also makes it possible to understand the audio signal decoder.
[0072] It should also be noted that substantial modifications can be made to the audio signal decoder and its individual components. For example, some features can be combined, such as sampling / resampling, windowing and converting from time domain to frequency domain. In addition, additional processing steps can be introduced where appropriate.
[0073] In addition, the encoded audio signal representation can, of course, comprise additional parallel information as needed or desired. 4. AUDIO SIGNAL DECODER ACCORDING TO FIGURE 2b
[0074] Figure 2b shows a schematic block diagram of an audio signal decoder 240, according to an embodiment of the invention. The audio signal decoder 240 can be very similar to the audio signal decoder 150, according to Figure 1b, so that identical means and signals are designated with identical reference numbers and will not be discussed in detail again.
[0075] The audio signal decoder 240 is configured to receive an encoded audio signal representation 152, for example, in the form of a bit stream. The encoded audio signal representation 152 comprises an encoded spectrum representation, for example, in the form of passwords (for example, acod_m) representing one or more spectral values, or at least part of a numerical representation of one or more values spectral. The encoded audio signal representation 152 also comprises encoded time warp information. In addition, the audio signal decoder 240 is configured to provide a representation of decoded audio signal 154, for example, a time domain representation of the audio content.
[0076] The audio signal decoder 240 comprises a context-based spectral value decoder 160, which is configured to receive passwords that represent spectral values of the encoded audio signal representation 152 and to provide, based on that, decoded spectral values. 162. In addition, the audio signal decoder 240 also comprises the context state determiner 170, which is configured to provide context status information 164 to the context-based spectral value decoder 160. The audio signal decoder 240 also comprises a time domain to time warp frequency domain converter 180, which receives the decoded spectral values 162 and which provides the representation of decoded audio signal 154.
[0077] The audio signal decoder 240 also comprises a time warp calculator (or time warp decoder) 250, which is configured to receive the encoded time warp information, which is included in the encoded audio signal representation 152, and to provide, based on that, decoded time warp information 254. The encoded time warp information can, for example, comprise "tw_ratio [i]" passwords that describe a time variation of a fundamental frequency or timbre . The decoded time deformation information 254 can, for example, take the form of deformation contour information. For example, the decoded time warp information 254 may comprise "warp value tbl [tw ratio [i]]" or Preifn values, as will be discussed in detail below. Optionally, the audio signal decoder 240 also comprises a time warp contour calculator 256, which is configured to derive time warp contour information 258 from decoded time warp information 254. The warp contour information time slot 258 can, for example, serve as input information for context state determiner 170, and also for the time domain converter for time warp frequency domain 180.
[0078] In the following, some details regarding the time domain to time deformation frequency domain converter will be described. The converter 180 can optionally comprise an inverse quantizer / scaler 180a, which can be configured to receive the decoded spectral values 162 from the context-based spectral value decoder 160 and to provide a quantized and / or scaled version inversely 180b of the decoded spectral values 162. For example, the inverse quantizer / scaler 180a can be configured to perform an operation that is at least approximately inverse to the operation of the optional scaler / quantizer 120m of the audio signal encoder 200. Likewise, the optional reverse quantizer / scaler 180a can receive control information that can correspond to control information 230.
[0079] The time domain to time strain frequency domain converter 180 optionally comprises a spectral preprocessor 180c which is configured to receive the decoded spectral values 162 or the quantized and inversely scaled spectral values 180b and to provide, based on in this, spectral values pre-processed spectral 180d. For example, the 180c spectral preprocessor can perform an inverse operation when compared to the 120k spectral postprocessor of the audio signal encoder 200.
[0080] The time domain to time deformation frequency domain converter 180 also comprises a time domain to frequency domain converter 180e, which is configured to receive the decoded spectral values 162, the inverse quantized / scaled spectral values 180b or the spectral values pre-processed spectral 180d and to provide, based on that, a 180f time domain representation. For example, the time domain to frequency domain converter can be configured to perform a spectral domain to inverse time domain transformation, for example, a reverse modified cosine transform (IMDCT). The time domain to frequency domain converter 180e can, for example, provide a time domain representation of an audio structure of the encoded audio signal based on a set of decoded spectral values or, alternatively, based on a plurality of sets of decoded spectral values. However, the audio structures of the encoded audio signal can, for example, overlap in time in some cases. However, the audio structures may not overlap in some other cases.
[0081] The time domain to time warp frequency domain 180 converter also comprises a 180g window, which is configured to window the 180f time domain representation and to provide a windowed 180h time domain representation based on the domain representation 180f time frame provided by the time domain to 180e frequency domain converter.
[0082] The time domain converter for time strain frequency domain 180 also comprises a resampler 180i, which is configured to resample the windowed time domain representation 180h and to provide, based on this, a windowed time domain representation and resampled 180j. The 180i resampler is configured to receive 180k sampling position information from a 1801 sampling position calculator. Likewise, the 180i resampler provides a windowed and resampled 180j time domain representation for each audio signal representation structure. coded, in which subsequent structures can be overlapped.
[0083] Likewise, a 180m overlay / adder receives the 180j resampled and resampled time domain representations of the subsequent audio structures of the encoded audio signal representation 152 and overlays and adds said windowed and resampled 180j time domain representations in order to achieve smooth transitions between subsequent audio structures.
[0084] The time-domain to time-deformation frequency domain converter optionally comprises 180o time domain post-processing configured to perform post-processing based on a 180n combined audio signal provided by the 180m overlay / adder.
[0085] The time deformation contour information 258 serves as an input information for the context state determiner 170, which is configured to adapt the derivation of the context deformation information 164 depending on the time deformation contour information 258. In addition, the sampling position calculator 1801 of the time domain to time deformation frequency domain converter 180 also receives the time deformation contour information and provides the 180k sampling position information based on said contour information time deformation 258, in order to adapt the time-varying resampling performed by the 180i resampler depending on the time deformation contour described by the time deformation contour information. Likewise, a variation of timbre is introduced in the time domain signal described by the time domain representation 180f according to the time warp contour described by the time warp contour information 258. Thus, it is possible to provide a 180j time domain representation of an audio signal having a significant variation in timbre over time (or a significant change in fundamental frequency over time) based on a sparse spectrum 180d having pronounced peaks and troughs. This spectrum can be encoded with high bit rate efficiency and, consequently, results in a comparatively low bit rate demand of the encoded audio signal representation 152.
[0086] In addition, the context (or, more generally, the derivation of the context state information 164) is also adapted depending on the time deformation contour information 258 using the context state determiner 170. Likewise, the encoded time deformation information 252 is reused twice and contributes to an improvement in coding efficiency by allowing the encoding of a sparse spectrum and by allowing the adaptation of context state information to the specific characteristics of the spectrum in the presence of a deformation of the spectrum time or a derivation of the fundamental frequency over time.
[0087] Additional details regarding the functionality of the individual components of the 240 audio signal encoder will be described below. 5. ARITHMETIC ENCODER ACCORDING TO FIGURE 2c
[0088] In the following, an arithmetic encoder 290 will be described, which can take the place of the context-based spectral value encoder 130 in combination with the context state determiner 140 in the audio signal encoder 100 or in the audio signal encoder 200. Arithmetic encoder 290 is configured to receive spectral values 291 (for example, spectral values of frequency domain representation 124) and to provide passwords 292a, 292b based on those spectral values 291.
[0089] In other words, the arithmetic encoder 290 can, for example, be configured to receive a plurality of postprocessed, scaled and quantified spectral values 291 from the audio representation of frequency domain 124. The arithmetic encoder comprises a bit plane extractor most significant 290a, which is configured to extract a more significant bit plane m from a spectral value. It should be noted here that the most significant bit plane can comprise one or even more bits (for example, two or three bits), which are the most significant bits of the spectral value.
[0090] Thus, the most significant bit plane extractor 290a provides a more significant bit plane value 290b of a spectral value. The arithmetic encoder 290 also comprises a first password determiner 290c, which is configured to determine an arithmetic password acod_m [ki] [m] representing the most significant bit plane value m.
[0091] Optionally, the first password determiner 290c can also provide one or more escape passwords (also referred to here with "ARITH_ESCAPE") that indicate, for example, how many less significant bit planes are available (and, consequently, that indicate the numerical weighting the most significant bit plane). The first password determiner 290c can be configured to provide the password associated with a more significant bit plane value m using a selected cumulative sequence table having (or being mentioned by) a pki cumulative sequence table index.
[0092] In order to determine which cumulative sequence table should be selected, the arithmetic encoder preferably comprises a 290d status tracker which can, for example, have the function of context state determiner 140. The 290d status tracker is configured to track the state of the arithmetic encoder, for example, when observing which spectral values were previously encoded. Status tracker 290d consequently provides status information 290e which can be equivalent to context status information 134, for example, in the form of a state value designated with "s" or "t" sometimes (where the value of state s must not be mixed with the frequency expansion factor s).
[0093] The arithmetic encoder 290 also comprises a cumulative sequence table selector 290f, which is configured to receive status information 290e and to provide information 290g describing the selected cumulative sequence table to password determiner 290c. For example, the cumulative sequence table selector 290f can provide a cumulative sequence table index "pki" that describes which cumulative sequence table, out of a set of, for example, 64 cumulative sequence tables, is selected for use by password determiner 290c. Alternatively, the cumulative sequence table selector 290f can provide the entire selected cumulative sequence table to password determiner 290c. Thus, password determiner 290c can use the cumulative sequence table selected for the provision of the password acod_m [ki] [m] of the most significant bit plane value m, so that the actual password acod_m [ki] [m] which encodes the most significant bit plane value m is dependent on the value of m and the accumulated frequency table index pki, and consequently on the current state information 290e. Additional details regarding the encryption process and the password format obtained will be described below. In addition, details regarding the operation of state tracker 290d, which is equivalent to context state determiner 140, will be discussed below.
[0094] The arithmetic encoder 290 further comprises a less significant bit plane extractor 290h, which is configured to extract one or more less significant bit planes from the scaled and quantified frequency domain audio representation 291, if one or more of the spectral values a encoded exceed the range of encodable values using only the most significant bit plane. The least significant bit planes can comprise one or more bits, as desired. Likewise, the least significant bit plane extractor 290h provides less significant bit plane information 290i.
[0095] The arithmetic encoder 290 also comprises a second password determiner 290j, which is configured to receive the least significant bit plane information 290i and to provide, based on this, zero, one or even more "acod_r" passwords that represent the content of zero, one or more less significant bit planes. The second password determiner 290j can be configured to apply an arithmetic encryption algorithm or any other encryption algorithm in order to derive the less significant bit plane password "acod_r" from the less significant bit plane information 290i.
[0096] It should be noted here that the number of less significant bit planes may vary depending on the value of the scaled and quantified spectral values 291, so that there may not always be less significant bit planes, if the scaled and quantified spectral value to be encoded is comparatively small, so that there may be a less significant bit plane if the current scaled and quantified spectral value to be encoded is of average variation and so that there can be more than one less significant bit plane if the scaled spectral value. and quantified to be coded has a comparatively large value.
[0097] To summarize the above, arithmetic encoder 290 is configured to encode scaled and quantified spectral values, which are described by information 291, using a hierarchical encoding process. The most significant bit plane (comprising, for example, one, two or three bits per spectral value) is encoded to obtain an arithmetic password "acod_m [ki] [m]" from a more significant bit plane value. One or more less significant bit planes (each of the less significant bit planes comprising, for example, one, two or three bits) are encrypted to obtain one or more "acod_r" passwords. When encoding the most significant bit plane, the value of the most significant bit plane is mapped to a password acod_m [ki] [m]. Sixty-four tables of different cumulative sequences are available for encoding the m-value depending on a state of the arithmetic encoder 170, i.e., depending on the previously coded spectral values. Likewise, the password "acod_m [ki] [m]" is obtained. In addition, one or more "acod_r" passwords are provided and included in the bit stream if one or more less significant bit planes are present.
[0098] However, according to the present invention, the derivation of status information 290e, which is equivalent to context status information 134, is adapted to changes in a fundamental frequency of a first audio structure to a subsequent second audio structure ( that is, between two subsequent audio structures). Details regarding this adaptation, which can be performed by the 290d status tracker, will be described below. 6. ARITHMETIC DECODER, ACCORDING TO FIGURE 2d
[0099] Figure 2d shows a schematic block diagram of an arithmetic decoder 295, which can take the place of the context-based spectral value decoder 160 and the context state determiner 170 in the audio signal decoder 150, according to Figure 1d, and in the audio signal decoder 240, according to Figure 2b.
[0100] The arithmetic decoder 295 is configured to receive a frequency-encoded domain representation 296, which may comprise, for example, arithmetically encoded spectral data in the form of passwords "acod_m" and "acod_r". The encoded frequency domain representation 296 can be equivalent to passwords inserted in the context-based spectral value decoder 160. Furthermore, the arithmetic decoder is configured to provide an audio representation of decoded frequency domain 297, which can be equivalent to decoded spectral values 162 provided by the context-based spectral value decoder 160.
[0101] The arithmetic decoder 295 comprises a more significant bit plane determiner 295a, which is configured to receive the acod_m [ki] [m] arithmetic password that describes the most significant bit plane value m. The most significant bit plane determiner 295a can be configured to use a cumulative sequence table from a set comprising a plurality of, for example, 64 cumulative sequence tables to derive the most significant bit plane value m from the arithmetic password " acod__m [ki] [m] ".
[0102] The most significant bit plane determiner 295a is configured to derive values 295b from a more significant bit plane from spectral values based on the password "acod_m". The arithmetic decoder 295 further comprises a less significant bit plane determiner 295c, which is configured to receive one or more "acod_r" passwords that represent one or more less significant bit planes of a spectral value. Likewise, the least significant determinant bit plane 295c is configured to provide decoded values 295d of one or more less significant bit planes. The arithmetic decoder 295 also comprises a bit plane combiner 295e, which is configured to receive decoded values 295b of the most significant bit plane of spectral values and decoded values 295b of one or more less significant bit planes of spectral values, if these less significant bit planes are available for the current spectral values. Likewise, the bit plane combiner 295e provides the encoded spectral values, which are part of the audio representation of the decoded frequency domain 297. Naturally, the arithmetic decoder 295 is typically configured to provide a plurality of spectral values in order to obtain a complete set of decoded spectral values associated with a current structure of the audio content.
[0103] The arithmetic decoder 295 further comprises a cumulative sequence table selector 295f, which is configured to select, for example, one of the 64 cumulative sequence tables depending on a state index 295g that describes a state of the arithmetic decoder 295. The decoder arithmetic 295 further comprises a state tracker 295h, which is configured to track a state of the arithmetic decoder depending on the previously decoded spectral values. The status tracker 295h can correspond to the context status determiner 170. Details regarding the status tracker 295h will be described below
[0104] Likewise, the 295f cumulative sequence table selector is configured to provide an index (for example, pki) of a selected cumulative sequence table, or a selected cumulative sequence table itself, for application in decoding the plane value most significant bit m depending on the password "acod_m".
[0105] Likewise, the arithmetic decoder 295 explores different probabilities of different combinations of values from the most significant bit plane of adjacent spectral values. Different cumulative sequence tables are selected and applied depending on the context. In other words, statistical dependencies between spectral values are explored by selecting different cumulative sequence tables, from a set comprising, for example, 64 different cumulative sequence tables, depending on a 295g state index (which can be equivalent to information of context state 164), which is obtained by observing the previously decoded spectral values. A spectral scaling is considered when adapting the derivation of the 295g status index (or context status information 164) depending on information about a change in fundamental frequency (or timbre) between the subsequent audio structures. 7. OVERVIEW OF THE CONTEXT ADAPTATION CONCEPT
[0106] The following is an overview of the concept of adapting the context of an arithmetic encoder using time deformation information. 7.1 BACKGROUND INFORMATION
[0107] In the following, some historical information will be provided in order to facilitate the understanding of the present invention. It should be noted that in Reference [3], a context-adaptive arithmetic encoder (see, for example, Reference [5]) is used to encode the quantized spectral boxes in a lossless manner.
[0108] The context used is described in Figure 3a, which presents a graphical representation of this adaptive arithmetic coding of context. In Figure 3a, it can be seen that the boxes already decoded from the previous structure are used to determine the context for the frequency boxes that must be decoded. It should be noted here that it does not matter for the described invention if the context and the coding are organized in quadruples or in a linear fashion or other n-tuples, where n can vary.
[0109] Referring again to Figure 3a, which presents a context-adaptive arithmetic encoding or decoding, it should be noted that an abscissa 310 describes a time and an ordinate 312 describes a frequency. It should be noted here that quadruples of spectral values are encoded using a common context state, according to The context shown in Figure 3a. For example, a context for decoding quadruples 320 of spectral values associated with an audio structure having time index k and frequency index i is based on the spectral values of a first quadruple 322 having time index k and frequency index i - 1 , a second quadruple 324 having time index k-1 and frequency index i - 1, a third quadruple 326 having time index k - 1 and frequency index ie a fourth quadruple 328 having time index k - 1 and index of time frequency 1 + 1. It should be noted that each of the frequency indices i-1, i, i + 1 designates (or, more precisely, is associated with) four frequency boxes of the conversion from frequency domain to time domain or conversion of time domain to frequency. Likewise, the context for decoding quadruple 320 is based on the spectral values of quadruples 322, 324, 326, 328 of spectral values. Likewise, spectral values having tuple frequency indices i-1, iei + l of the previous audio structure having time index k - 1 are used to derive the context for decoding spectral values having tuple frequency index i of the structure of current audio having k time index (typically in combination with spectral values having tuple frequency index i - 1 of the currently decoded audio structure having k time index).
[0110] It has been found that time-deformed transformation typically leads to better energy compaction for harmonic signals with variations in fundamental frequencies, leading to spectra that have a clear harmonic structure rather than more or less spotted larger partials that would occur if deformation is not applied of time. Another effect of time deformation is caused by possible different local average sampling frequencies of consecutive structures. This effect has been found to cause the consecutive spectra of a signal with an otherwise constant harmonic structure, but which varies the fundamental frequency to be extended along the frequency axis.
[0111] A lower graph 390 in Figure 3c shows this example. Contains graphs (for example, of a magnitude in dB as a function of a frequency box index) of two consecutive structures (for example, structures designated as "last structure" and "that structure", where a harmonic signal with a Variant fundamental frequency is encoded by a time-warped modified discrete cosine transform encoder (TW-MDCT encoder).
[0112] The evolution of the corresponding relative pitch can be found in a graph 370 in Figure 3b, which shows a reduction in relative pitch and, therefore, an increasing relative frequency of the harmonic lines.
[0113] This leads to an increased frequency of the harmonic lines after the application of the time warping algorithm (for example, sampling or resampling of time warping). It can be clearly seen that this spectrum of the current structure (also referred to as "that structure") is an approximate copy of the spectrum of the last structure, but extended along the 392 frequency axis (marked in terms of frequency boxes of the discrete transform of modified cosine). This would also mean that if we use the previous structure (also referred to as the "last structure") as a context for the arithmetic encoder (for example, for decoding the spectral values of the current structure (which is also referred to as "that structure") , the context would be sub-ideal since the partials of correspondence would now occur in different frequency boxes.
[0114] An upper graph 380 in Figure 3c shows this (for example, a bit demand to encode spectral values using context-dependent arithmetic encoding) compared to a Huffman encoding scheme that is normally considered less efficient than an arithmetic encoding scheme . Due to the previous subideal context (which can, for example, be defined by the spectral values of the "last structure", which are represented in graph 390 of Figure 3c), the arithmetic coding scheme spends more bits where partial tones of the current structure are located , in areas with low energy in the previous structure and vice versa. On the other hand, Figure 380 of Figure 3c shows that, if the context is good, which is at least the case of the fundamental partial tone, the bit distribution is smaller (for example, when using context-dependent arithmetic coding) than with Huffman coding in comparison.
[0115] To summarize the above, graph 370 in Figure 3b presents an example of a temporal evolution of a relative timbre contour. An abscissa 372 describes the time and an ordinate 374 describes both a relative prei tone and a friable relative frequency. A first curve 376 describes a temporal evolution of the relative timbre, and a second curve 377 describes a temporal evolution of the relative frequency. As can be seen, the relative timbre decreases over time, while the relative frequency increases over time. In addition, it should be noted that a time extension 378a of a previous structure (also referred to as "the last structure") and a time extension 378b of a current structure (also referred to as "that structure") are not superimposed on Graph 370 in Figure 3b . However, typically, time extensions 378a, 378b of subsequent audio structures can be overlapped. For example, the overlap can be approximately 50%.
[0116] Referring now to Figure 3c, it should be noted that Graph 390 presents MDCT spectra for the two subsequent structures. An abscissa 392 describes the frequency in terms of frequency boxes of the modified discrete cosine transform. An ordinate 394 describes a relative magnitude (in terms of decibels) of the individual spectral boxes. As can be seen, the spectral peaks of the spectrum of the current structure ("this structure") are shifted in frequency (in a frequency dependent manner) in relation to the corresponding spectral peaks of the spectrum of the previous structure ("last structure"). Likewise, it was found that a context for the context-based coding of the current structure's spectral values is not well adapted, if the said context is formed based on the original version of the previous audio structure's spectral values, as the Spectral peaks of the spectrum of the current structure do not coincide (in terms of frequency) with the spectral peaks of the spectrum of the previous audio structure. Thus, a bit rate demand for context-based encoding of spectral values is comparatively high and may be even greater in the case of a context-based Huffman encoding. This can be seen in Graph 380 of Figure 3c, where an abscissa describes the frequency (in terms of boxes of the modified discrete cosine transform), and where an ordinate 384 describes a number of bits necessary for the encoding of spectral values. 7.2. DISCUSSION OF THE SOLUTION
[0117] However, the embodiments according to the present invention, provide a solution to the problem discussed above. It has been found that timbre variation information can be used to derive an approximation of the frequency extension factor between consecutive spectra from a time-warped modified discrete cosine transform encoder (for example, between spectra of consecutive audio structures) . It has been found that this extension factor can then be used to extend the previous context along the frequency axis to derive a better context and, therefore, to reduce the number of bits needed to encode a frequency line and increase the gain of codification.
[0118] It has been found that good results can be achieved if this extension factor is approximately the proportion of the average frequencies of the last structure and the current structure. Furthermore, it was found that this could be done in a linear manner, or, if the arithmetic encoder encodes n-tuples of lines as an item, by tuples.
[0119] In other words, the extension of the context can be done by line (that is, individually by frequency box of the modified discrete cosine transform) or by tuples (that is, by tuples or a set of a plurality of spectral boxes of the discrete cosine transform) modified).
[0120] In addition, the resolution for computing the extension factor can also vary depending on the requirements of the achievements. 7.3 EXAMPLES FOR DERIVING THE EXTENSION FACTOR
[0121] In the following, some concepts for deriving the extension factor will be described in detail. The time-warped modified cosine transform method described in Reference [3], and, alternatively, the time-warped modified cosine transform method described here, provides a so-called smooth timbre contour as an intermediate information. This smoothed tone outline (which can, for example, be described by the entries in the order "warp_contour []" or by the entries in the "new_warp_contour []" and "past_warp_contour []" arrangements) contains information on the evolution of the relative tone over several consecutive structures, so that, for each sample within a structure, an estimate of the relative timbre is known. The relative frequency for this sample is, then, simply the inverse of this relative timbre.
[0122] For example, the following relationship can be maintained:
[0123] In the above equation, prei [n] designates the relative timbre for a given time index n, which can be a relative short timbre (where the time index n can, for example, designate an individual sample). In addition, frei [n] can designate a relative frequency for the time index n, and can be a short-lived relative frequency value. 7.3.1 FIRST ALTERNATIVE
[0124] The average relative frequency over a k structure (where k is a structure index) can then be described as an arithmetic mean over all the relative frequencies within that k structure:
[0125] In the above equation, frel, mean, k designates the average relative frequency over the audio structure having a temporal structure index k. N designates a number of time domain samples for the audio structure having the time structure index k. n is a variable that runs on the time domain sample indexes n = 0 to n = N-l of the time domain sample of the current audio structure having audio structure index k. frel [n] designates the local relative frequency value associated with the sample time domain having a sample time index of time domain n.
[0126] From that (ie, the frel, mean, k computation for the current audio structure, and the frel, mean, ki computation for the previous audio structure), the extension factor s for the current audio structure k can be derived as:
[0127] In the following, another alternative for computing the extension factor s will be described. A simpler and less accurate approximation of the extension factor s (for example, when compared to the first alternative) can be found if it is taken into account that, on average, the relative pitch is close to one, so that the ratio of the pitch relative and relative frequency is approximately linear, and so that the step of inversion of the relative pitch to obtain the relative frequency can be omitted and using the average relative pitch:
[0128] In the above equation, Prei, mean, k designates an average relative timbre for the audio structure having a temporal index of audio structure k. N designates a number of audio domain time domain samples having audio structure time index k. The execution of the variable n obtains values between O and N-1 and, thus, it is executed on time domain samples having temporal indexes n of the current audio structure. prel [n] designates a relative (local) timbre value for the time domain sample having n time domain index. For example, the relative timbre value preL [n] can be equal to the warp_contour [n] input of the warp_contour [] deformation contour order.
[0129] In this case, the extension factor s for the audio structure having temporal structure k can be approximated as:
[0130] In the equation above, prel, mean, k-1 designates an average timbre value for the audio structure having the audio structure temporal index k-1, and the variable prel, mean, k describes an average relative timbre value for the audio structure having temporal audio structure k. 7.3.3 ADDITIONAL ALTERNATIVES
[0131] However, it should be noted that significantly different concepts for the computation or estimation of the extension factor s can be used, in which the extension factor s also typically describes a fundamental frequency change between the first audio structure and a second audio structure. subsequent. For example, the spectra of the first audio structure and the second subsequent audio structure can be compared using a pattern comparison concept, in order to derive the extension factor. Nevertheless, it appears that the computation of the frequency expansion factor s using the deformation contour information, as discussed above, is particularly efficient from a computational point of view, so that this is the preferred option. 8. DETAILS REGARDING THE DETERMINATION OF THE CONTEXT STATE 8.1. EXAMPLE ACCORDING TO FIGURES 4a and 4b
[0132] In the following, the details regarding the determination of the context state will be described. To this end, the functionality of the context state determiner 400, whose schematic block diagram is shown in Figure 4a, will be described.
[0133] The context state determiner 400 can, for example, take the place of the context state determiner 140 or the context state determiner 170. Even if the details regarding the context state determiner are described below for the case of an audio signal decoder, the context status determiner 400 can also be used in the context of an audio signal encoder.
[0134] The context state determiner 400 is configured to receive information 410 about previously decoded spectral values or about previously encoded spectral values. In addition, context state determiner 400 receives time warp information or time warp contour information 412. The time warp information or time warp contour information 412 can, for example, be the same as time warping information 122 and can therefore describe (at least implicitly) a change in a fundamental frequency between subsequent audio structures. The time warp information or time warp contour information 412 can alternatively be equivalent to the time warp information 184 and can therefore describe a change in a fundamental frequency between subsequent structures. However, the time warp information / time warp contour information 412 may alternatively be equivalent to the time warp contour information 222 or the time warp contour information 258. Generally speaking, it can be said that the time warp information / time warp contour information 412 can describe the frequency variation between subsequent audio structures directly or indirectly. For example, the time warp information / time warp contour information 212 can describe the warp contour and can therefore understand the entries in the order "warp_contour []", or it can describe the time contour, and it can , consequently, understand the entries of the order "time_contour []".
[0135] The context state determiner 400 provides a context state value 420, which describes the context to be used for encoding or decoding spectral values of the current structure, and which can be used by the context-based spectral value encoder or context-based spectral decoder for selecting an appropriate mapping standard for encoding or decoding the spectral values of the current audio structure. The context state value 420 can, for example, be equivalent to the context status information 134 or the context status information 164.
[0136] The context state determiner 400 comprises a preliminary context memory structure provider 430, which is configured to provide a preliminary context memory structure 432 such as, for example, the order q [1] []. For example, the preliminary context memory structure provider 430 can be configured to perform the functionality of the algorithms, according to Figures 25 and 26, and thereby provide a set of, for example, N / 4 entries q [l] [i] of the order q [l] [] (for i = 0 ai = M / 4 - 1).
[0137] Generally speaking, the preliminary context memory structure provider 430 can be configured to provide the inputs of the preliminary context memory structure 432 so that an input having an input frequency index i is based on one (unique) spectral value having frequency index i, or in a set of spectral values having a common frequency index i. However, the preliminary context memory structure provider 430 is preferably configured to provide preliminary context memory structure 432 so that there is a fixed frequency index relationship between a frequency index of an input of the context memory structure preliminary 432 and frequency indices of one or more encoded spectral values or decoded spectral values on which the preliminary context memory structure 432 entry is based. For example, said predetermined index ratio may be such that the input q [1] [i] of the preliminary context memory structure is based on the spectral value of the frequency box having frequency box index i (or i- const, where const is a constant) of the time domain to frequency domain converter or the time domain to frequency domain converter. Alternatively, the input q [1] [i] of preliminary context memory structure 432 may be based on the spectral values of frequency boxes having frequency box indices 2i - 1 and 2i of the time domain to domain converter frequency or time domain to frequency domain converter (or a swapped variation of frequency box indices). Alternatively, however, an index q [1] [i] of the preliminary context memory structure 432 can be based on the spectral values of frequency boxes having frequency box indexes 4i - 3, 4i - 2, 4i - 1 and 4i from the time domain to frequency domain converter or the time domain to frequency domain converter (or a swapped variation of frequency box indices). Thus, each entry of the preliminary context memory structure 432 can be associated with a spectral value of a predetermined frequency index or a set of spectral values of predetermined frequency indexes of the audio structures, on the basis of which the memory structure of preliminary context 432 is adjusted.
[0138] The context state determiner 400 also comprises a frequency expansion factor calculator 434, which is configured to receive time deformation information / time deformation contour information 412 and to provide, based on that, information of time deformation. frequency expansion factor 436. For example, the frequency expansion factor calculator 434 can be configured to derive relative pitch information prel [n] from warp_contour [] order inputs (where relative pitch information prel [n] n] can, for example, be equal to a corresponding entry in the order warp_contour []). Furthermore, the frequency expansion factor calculator 434 can be configured to apply one of the above equations to derive the frequency expansion factor information s from said relative pitch information about two subsequent audio structures.
[0139] Generally speaking, the frequency expansion factor calculator 434 can be configured to provide the frequency expansion factor information (for example, an s value or, equivalently, an m_ContextUpdateRatio value) so that the information of frequency expansion factor describes a change in a fundamental frequency between a previously encoded or decoded audio structure and the current audio structure to be encoded or decoded using the current context state value 420.
[0140] Context state determiner 400 also comprises a frequency scaled context memory structure provider, which is configured to receive preliminary context memory structure 432 and to provide, based on this, a context memory structure scaled by frequency. For example, the context memory structure scaled by frequency can be represented by an updated version of the order q [1] [], which can be an updated version of the order that carries the preliminary context memory structure 432.
[0141] The frequency scaled context memory structure provider can be configured to derive the frequency scaled context memory structure from the preliminary context memory structure 432 using a frequency scaling. In frequency scaling, a value of an entry having entry index i of the preliminary context memory structure 432 can be copied or exchanged for an entry having entry index j of the frequency scaled context memory structure 440, where the frequency index i can be different from frequency index j. For example, if a frequency extension of the content of the preliminary context memory structure 432 is performed, an entry having input index j1 of the frequency scaled context memory structure 440 can be adjusted to the value of an entry having input index. i1 of the preliminary context memory structure 432, and an input having input index j2 of the frequency scaled context memory structure 440 can be adjusted to a value of an input having input index i2 of the preliminary context memory structure 432 , where j2 is greater than i2, and where j1 is greater than i1. A ratio between corresponding frequency indices (for example, j1 and i1, or j2 and i2) can obtain a predetermined value (except for rounding errors). Similarly, if a frequency compression of the content described by the preliminary context memory structure 432 is to be performed by the frequency scaled context memory provider 438, an entry having input index j3 of the context memory structure frequency scaled 440 can be adjusted to the value of an input having input index i3 of the preliminary context memory structure 432, and an input having input index j4 of the frequency scaled context memory structure 440 can be adjusted to a value of an entry having input index i4 of the preliminary context memory structure 432. In this case, the input index j3 can be less than the input index i3, and the input index j4 can be less than the input index i4 . In addition, a ratio between corresponding input indices (for example, between input indices j3 and i3, or between input indices j4 and i4), can be constant (except for rounding errors), and can be determined by factor information frequency expansion 436. Additional details regarding operation of the 440 frequency scaled context memory structure provider will be described below.
[0142] Context state determiner 400 also comprises a context state value provider 442, which is configured to provide context state value 420 based on frequency scaled context memory structure 440. For example, the context state value 442 can be configured to provide a context state value 420 that describes the context for decoding a spectral value having frequency index 10 based on inputs from the frequency scaled context memory structure 440, the frequency indices of these inputs are in a predetermined relationship with frequency index 10. For example, context state value provider 442 can be configured to provide context state value 420 for decoding the spectral value (or tuples of spectral values) having frequency index 10 based on inputs of the context memory structure scaled by frequency 440 having frequency indices frequency 10 - 1, 10 and 10 + 1.
[0143] Likewise, context state determiner 400 can efficiently provide context state value 420 for decoding a spectral value (or tuples of spectral values) having a frequency index of 10 based on the memory structure inputs preliminary context 432 having respective frequency indices less than 10 - 1, less than 10 and less than 10 + 1 if a frequency extension is performed by the frequency scaled context memory provider 438, and based on the inputs of the preliminary context memory structure 432 having respective frequency indices greater than 10 - 1, greater than lo and greater than 10 + 1, respectively, in the case where a frequency compression is performed by the context memory structure provider scaled by frequency 438.
[0144] Thus, the context state determiner 400 is configured to adapt the context determination to a change in a fundamental frequency between subsequent structures by providing the context state value 420 based on a frequency scaled context memory structure, which it is a frequency scaled version of the preliminary context memory structure 432, scaled by frequency depending on the frequency expansion factor 436, which, in turn, describes a fundamental frequency variation over time.
[0145] Figure 4b shows a graphical representation of the determination of the state of context, according to an embodiment of the invention. Figure 4b shows a schematic representation of the preliminary context memory structure entries 432, which is provided by the preliminary context memory structure provider 430, at reference number 450. For example, an entry 450a having frequency index i1 + 1, an input 450b and an input 450c having frequency index i2 + 2 are marked. However, by providing the frequency memory structure scaled by frequency 440, which is shown in reference number 452, an input 452a having frequency index i1 is adjusted to obtain the value of input 450a having frequency index ii + 1, and an input 452c having frequency index i2 - 1 is adjusted to obtain the value of input 450c having frequency index i2 + 2. Similarly, the other inputs of the frequency scaled context memory structure 440 can be adjusted depending on the inputs of the preliminary context memory structure 430, where, typically, some of the inputs of the preliminary context memory structure are discarded in the case of a frequency compression, and where, typically, some of the inputs of the preliminary context memory structure 432 they are copied to more than one entry of the 440 frequency scaled context memory structure in the case of a frequency extension.
[0146] In addition, Figure 4b illustrates how the context state is determined for decoding spectral values of the audio structure having a temporal index k based on the inputs of the frequency memory structure scaled by frequency 440 (which are represented in reference number 452) . For example, when determining the context state (represented, for example, by the context state value 420) for decoding the spectral value (or tuples of spectral values) having frequency index i1 of the audio structure having temporal index k , a value context having frequency index i1 - 1 of the audio structure having temporal index k and inputs of the context memory structure scaled by frequency of the audio structure having temporal index k -1 and frequency indices i1 - 1, i1 and i1 + 1 are evaluated. Likewise, inputs from the preliminary context memory structure of the audio structure having temporal index k-1 and frequency indices i1 - 1, i1 +1 and i1 + 2 are efficiently evaluated to determine the context for decoding the value. spectral (or spectral value tuples) of the audio structure having time index k and frequency index i1. Thus, the environment of spectral values, which are used to determine the state of context, is altered efficiently by the frequency extension or frequency compression of the preliminary context memory structure (or its contents). 8.2. IMPLEMENTATION ACCORDING TO FIGURE 4c
[0147] In the following, an example for mapping the context of an arithmetic encoder using quadruples will be described with reference to Figure 4c, which presents a tuple processing.
[0148] Figure 4c presents a pseudoprogram code representation of an algorithm for obtaining the frequency scaled context memory structure (for example, the frequency scaled context memory structure 440) based on the preliminary context memory structure (for example, example, preliminary context memory structure 432).
[0149] The algorithm 460, according to Figure 4c, assumes that the preliminary context memory structure 432 is stored in a "self-> base.m_qbuf" order. In addition, algorithm 460 assumes that the frequency expansion factor information 436 is stored in a variable "self-> base.m_ContextUpdateRatio".
[0150] In a first step 460a, several variables are initialized. In particular, a target tuple index variable "nLinTupleldx" and a source tuple index variable "nWarpTupleldx" are initialized to zero. In addition, a "Tqi4" rearranging buffer order is initialized.
[0151] In a step 460b, the entries in the preliminary context memory structure "self-> base.m_qbuf" are copied in the rearranging buffer order.
[0152] Subsequently, a copy algorithm 460c is repeated as long as both the target tuple index variable and the source tuple index variable are less than an nTuples variable that describes a maximum number of tuples.
[0153] In a step 460ca, four entries of the reorganizing buffer, whose frequency index (tuples) is determined by a current value of the source tuple index variable (in combination with a first constant index "firstldx") are copied into entries of the structure of context memory (self-> base.m_qbuf [] []), the frequency indices for these entries are determined by the target tuple index variable (nLinTupleldx) (in combination with the first constant index "firstldx").
[0154] In a 460cb step, the target tuple index variable is incremented by one.
[0155] In a 460cc step, the source tuple index variable is set to a value, which is a product of the current value of the target tuple index variable (nLinTupleldx) and frequency expansion factor information (self-> base). m_ContextUpdateRatio), rounded to the nearest whole number. Likewise, the value of the source tuple index variable can be greater than the value of the target tuple index variable if the variable frequency expansion factor is greater than one and less than the target tuple index variable if the variable frequency expansion factor is less than one.
[0156] Likewise, one value of the source tuple variable is associated with each value of the target tuple index variable (as long as both the value of the target tuple index variable and the value of the source tuple variable are less than the nTuple constants) . Subsequent to the execution of steps 460cb and 460cc, the copy of entries from the rearranging buffer to the context memory structure is repeated in step 460ca, using the variable association between a source tuple and the target tuple.
[0157] Thus, algorithm 460, according to Figure 4c, performs the functionality of the frequency scaled context memory structure provider 430a, in which the preliminary context memory structure is represented by the initial entries of the order "self-> base" .m_qbuf ", and where the 440 frequency scaled context memory structure is represented by the updated entries of the order" self-> base.m_qbuf ". 8.3. IMPLEMENTATION ACCORDING TO FIGURES 4d AND 4e
[0158] In the following, an example for mapping the context of an arithmetic encoder using quadruples will be described with reference to Figure 4c, which presents a linear processing.
[0159] Figures 4d and 4e show a pseudoprogram code representation of an algorithm to perform frequency scaling (i.e., frequency extension or frequency compression) of a context.
[0160] The algorithm 470, according to Figures 4d and 4e, receives, as an input information, the order "self-> base.m_qbuf [] []" (or at least a reference to said order) and the factor information frequency expansion "self self-> base.m_ContextUpdateRatio". In addition, algorithm 470 receives, as input information, a variable "self-> base.m_IcsInfo-> m_ScaleFactorBandsTransmitted", which describes a number of active lines. In addition, the 470 algorithm modifies the self-> base order. m_qbuf [] [], so that the entries of that order represent the context memory structure scaled by frequency.
[0161] The algorithm 470 comprises, in a step 470a, an initialization of a plurality of variables. In particular, a target line index variable (linLineldx) and a source line index variable (warpLineldx) are initialized to zero.
[0162] In step 470b, several active tuples and several active lines are computed.
[0163] Next, two sets of contexts are processed, which comprise different context indexes (designated by the variable "contextldx"). However, in other realizations, it is also sufficient to only process a context.
[0164] In a step 470c, a time line buffer order "lineTmpBuf" and a line order rearranging buffer order "lineReorderBuf" are initialized with zero entries.
[0165] In a step 470d, the preliminary context memory structure entries associated with different frequency boxes of a plurality of spectral value tuples are copied to the line rearranging buffer order. Likewise, the entries of the line rearranging buffer order having subsequent frequency indices are adjusted to the inputs of the preliminary context memory structure that are associated with the different frequency boxes. In other words, the preliminary context memory structure comprises an entry "self-> base.m_qbuf [CurTuple] [contextldx]" per spectral value tuple, where the entry associated with a spectral value tuple comprises subentries a, b , c, d associated with the individual spectral lines (or spectral boxes). Each of the subentries a, b, c, d is copied to an individual entry of the line reorder buffer order "lineReorderBuf []" in step 470d.
[0166] Consequently, the contents of the line rearranging buffer order are copied into the line temporal buffer order "lineTmpBuf []" in a step 470e.
[0167] Subsequently, the target line index variable and the source line index variable are initialized to obtain the value of zero in a step 470f.
[0168] Subsequently, the "lineReorderBuf [warpLineldx]" entries of the line rearranging buffer order are copied to the line temporal buffer order for a plurality of values of the target line index variable "linLineldx" in one step 470g. Step 470g is repeated as long as both the target line index variable and the source line index variable are less than an "activeLines" variable, which indicates a total number of active spectral lines (not zero). A line buffer order entry designated by the current value of the target line index variable "linLineldx" is adjusted to the value of the line reorganizing buffer order designated by the current value of the source line index variable. Subsequently, the target line index variable is increased by one. the source line index variable "warpLineldx" is adjusted to obtain a value that is determined by the product of the current value of the target line index variable and the frequency expansion factor information (re-presented by the variable "self-> base"). m_ContextUpdateRatio ".
[0169] After updating the target line index variable and the source line index variable, step 470g is repeated, as long as both the target line index variable and the source line index variable are less than the value of the variable "activeLines".
[0170] Likewise, the context entries of the preliminary context memory structure are scaled by frequency and linearly, rather than by tuples.
[0171] In a final step 470h, a representation by tuples is reconstructed based on the inputs per line of the time-line buffer order. Entries a, b, c, d, of a tuple representation "self-> base.m_qbuf [curTuple] [contextldx]" of the context are adjusted according to four entries "lineTmpBuf [(curTuple-1) * 4 + 0 ] "a" lineTmpBuf [(curTuple-Ι) * 4 + 3] "of the line buffer order temporally, these entries are adjacent in frequency. In addition, a tuple energy field "e" is optionally adjusted to represent an energy of the spectral values associated with the respective tuples. In addition, an additional field "v" of the representation of tuples is optionally adjusted if the magnitude of the spectral values associated with said tuple is comparatively small.
[0172] However, it should be noted that the details regarding the calculation of new tuples, which is performed in a 470h step, are strongly dependent on the actual representation of the context and, therefore, can vary significantly. However, it can be said in general that a representation based on tuples is created based on a representation based on an individual line of the context scaled by frequency in step 470h.
[0173] In short, according to the 470 algorithm, a tuple representation of context (entries of the order "self-> base.m_qbuf [curTuple] [contextldx]") is first divided into a representation of context by frequency line (or representation of context per frequency box) (step 470d). Subsequently, frequency scaling is performed per line (step 470g). Finally, a tuple representation of the context (updated entries of the order "self-> base, m_qbuf [curTuple] [contextldx]") is reconstructed (step 470h) based on the information scaled by frequency, per line. 9. DETAILED DESCRIPTION OF THE TIME-domain-to-frequency domain decoding algorithm 9.1. OVERVIEW
[0174] In the following, some of the algorithms performed by an audio decoder, according to an embodiment of the invention, will be described in detail. For this purpose, reference is made to Figures 5a, 5b, 6a, 6b, 7a, 7b, 8, 9, 10a, 10b, 11, 12, 13, 14, 15 and 16.
[0175] First of all, reference is made to Figure 7a, which presents a legend for data element definitions and a legend for help element definitions. In addition, reference is made to Figure 7b, which presents a legend for constant definitions.
[0176] Generally speaking, it can be said that the methods described here can be used for decoding an audio stream that is encoded according to a discrete cosine transform modified by time deformation. Thus, when TW-MDCT is enabled for an audio stream (which can be indicated by a signal, for example, referred to as "twMDCT" signal, which can be understood in a specific configuration information), a bank of filter filters time warp and block alternator can replace a standard filter bank and block alternator in an audio decoder. In addition to the discrete inverse modified cosine transform (IMCT), the time warp filter bank and block alternator contains a time domain to time domain mapping of an arbitrarily spaced time structure or linearly spaced time structure and a corresponding adaptation of window shapes.
[0177] It should be noted here, that the decoding algorithm described here can be performed, for example, by the time domain to time deformation frequency domain converter 180 based on the coded representation of the spectrum and also based on the deformation information of time coded 184,252. 9.2. DEFINITIONS:
[0178] Regarding the definition of data elements, help elements and constants, reference is made to Figures 7a and 7b. 9.3. DECODING PROCESS - DEFORMATION CONTOUR
[0179] The codebook indices of the deformation contour nodes are decoded as follows to deformation values for the individual nodes:
[0180] However, the mapping of time warp passwords "tw_ratio [k]" to decoded time warp values, designated here as "warp_value_tbl [tw_ratio [k]]", can optionally be dependent on the sampling frequency in the performance of according to the invention. Likewise, there is not a single mapping table in some embodiments, according to the invention, but there are individual mapping tables for different sampling frequencies.
[0181] To obtain the new deformation contour data per sample (n_long samples) "new_warp_contour []", the deformation node values "warp_node_values []" are now linearly interpolated between equally spaced nodes (distant interp_dist) using an algorithm, a representation of pseudoprogram code that is presented in Figure 9.
[0182] Before obtaining the complete deformation contour for this structure (for example, for a current structure), the buffered values of the previous one can be scaled, so that the last deformation value of the last deformation contour "past_warp_contour [] "= 1.
[0183] The complete deformation contour "warp_contour []" is obtained by concatenating the last deformation contour "past_warp_contour" and the new deformation contour "new_warp_contour", and the new deformation sum "new_warp_sum" is calculated as a sum of all new deformation contour values "new_warp_contour []":
[0184] From the warp_contour [] "deformation contour, a vector of the sample positions of the deformed samples on a linear time scale is computed. For this, the time deformation contour is generated, according to the following equations:
[0185] With the auxiliary functions "warp_inv_vec ()" and "warp_time_inv ()", whose representations of pseudoprogram code are presented in Figures 10a and 10b, respectively, the sample position vector and the transition length are computed, according to a algorithm, a pseudoprogram code representation of which is shown in Figure 11. 9.5. REVERSE MODIFIED DISCREET DECODIFICATION-TRANSFORMED PROCESS (IMDCT)
[0186] In the following, the discrete reverse modified cosine transform will be briefly described.
[0187] The analytical expression of the discrete reverse modified cosine transform is as follows:
[0188] The synthesis window length for the inverse transform is a function of the "window_sequence" syntax element (which can be included in the bit stream) and the algorithmic context. The length of the synthesis window can, for example, be defined according to the table in Figure 12.
[0189] The significant block transitions are listed in the table in Figure 13. A check mark on a given table ballot indicates that a sequence of windows listed in that particular row can be followed by a sequence of windows listed in that particular column.
[0190] Regarding the permitted window sequences, it should be noted that the audio decoder can, for example, be switched between windows of different lengths. However, alternating window lengths is not of particular relevance to the present invention. Consequently, the present invention can be understood based on the assumption that there is a window sequence of the type "only_long_sequence" and that the length of the central encoder structure is equal to 1024.
[0191] In addition, it should be noted that the audio signal decoder can be switched between a frequency domain encoding mode and a time domain encoding mode. However, this possibility is not of particular relevance to the present invention. Consequently, the present invention is applicable to audio signal decoders that are only capable of manipulating the frequency domain encoding mode, as discussed, for example, with reference to Figures 1b and 2b. 9.6. DECODING-WINDOW PROCESS AND BLOCK ALTERNATOR
[0192] In the following, the windowing and block alternator, which can be performed by the time domain converter for time deformation frequency domain 180 and, in particular, by its 180g window, will be described.
[0193] Depending on the "window_shape" element (which can be included in a bit stream that represents the audio signal), different prototypes of oversampled transform windows are used, and the length of the oversampled windows is Nos = 2.n_long.OS_FACTOR_WIN
[0194] For window_shape == 1, the window coefficients are determined by the Kaiser-Bessel-derived window (KBD) as follows:
[0195] Otherwise, for window_shape == 0, a sine molasses is used as follows:
[0196] For all types of window sequences, the prototype used for the left window part is the one determined by the window shape of the previous block. The following formula expresses this fact:
[0197] Likewise, the prototype for the right window shape is determined by the following formula:
[0198] Since the transition lengths are already determined, one should only differentiate between "EIGHT_SHORT_SEQUENCE" type window sequences and all other window sequences.
[0199] In the event that a current structure is of the type "EIGHT_SHORT_SEQUENCE", a window and an overlap and internal addition (internal by structure) are carried out. The code-type part of Figure 14 describes the window and the internal overlay-addition of the structure having the window type "EIGHT_SHORT_SEQUENCE".
[0200] For structures of any other types, an algorithm can be used, a representation of the pseudoprogram code which is presented in Figure 15. 9.7. DECODING PROCESS - REVIEWING VARIANT IN TIME
[0201] In the following, time-varying resampling will be described, which can be performed by the time domain to time deformation frequency domain 180 converter and, in particular, by the 180i resampler.
[0202] The z [] windowed block is resampled according to the sample positions (which are provided by the sampling position calculator 1801 based on the decoded time warp contour information 258) using the following impulse response:
[0203] Before resampling, the windowed block is filled with zeros at both ends:
[0204] The resampling itself is described in a section of pseudoprogram code shown in Figure 16. 9.8. DECODING-OVERLAY PROCESS AND ADDING WITH PREVIOUS WINDOW SEQUENCES
[0205] The overlap and addition, which is performed by the overlay / adder 180m from the time domain to time deformation frequency domain 180 converter, is the same for all sequences and can be described mathematically as follows:
[0206] In the following, a memory upgrade will be described. Although no specific means are presented in Figure 2b, it should be noted that the memory update can be performed by the time domain to time deformation frequency domain 180 converter.
[0207] The memory buffers for decoding the next structure are updated as follows: past_warp_contour [n] = warp_contour [n + n_long], for 0 ≤ n <2. n_long cur _warp_sum = new _warp __sum last _warp_sum = cur _w arp__sum
[0208] Before decoding the first structure or if the last structure was encoded with an optical LPC domain encoder, the memory states are adjusted as follows: past_warp_contour [n] = 1, for 0 ≤ n <2 · n_long cur__warp_sum -n_long last __warp _sum = n_long 9.10. DECODING-COMPLETION PROCESS
[0209] To summarize the above, a decoding process has been described, which can be performed by the time domain to time deformation frequency domain 180 converter. As can be seen, a time domain representation is provided for a structure of time audio, for example, 2048 time domain samples, and subsequent audio structures can, for example, overlap by approximately 50%, so that a smooth transition between time domain representations of subsequent audio structures is guaranteed .
[0210] A set of, for example, NUM_TW_NODES = 16 decoded time strain values can be associated with each of the audio structures (as long as the time strain is active on said audio structure), regardless of the actual sampling frequency of the samples time domain of the audio structure. 10. SPECTRAL QUIET CODING
[0211] In the following, some details regarding the silent spectral coding will be described, which can be performed by the context-based spectral value decoder 160 in combination with the context state determiner 170. It should be noted that a corresponding coding can be performed by the spectral context value encoder in combination with the context state determiner 140, in which a person skilled in the art will understand the respective coding steps of the detailed discussion of the decoding steps. 10.1. SPECTRAL QUIET CODING- TOOL DESCRIPTION
[0212] Silent spectral coding is used to further reduce the redundancy of the quantified spectrum. The silent spectral coding scheme is based on an arithmetic coding in conjunction with a dynamically adapted context. The silent spectral coding scheme discussed below is based on 2-tuples, ie close spectral coefficients are combined. Each 2-tuple is divided into the signal, the plane by two most significant bits and the remaining least significant bit planes. Silent coding for the plane by two most significant bits, m, uses context-dependent cumulative frequency tables derived from the four 2-tuples previously decoded. The silent coding is fed by the quantified spectral values and uses context-dependent cumulative frequency tables derived from (for example, selected according to) four previously decoded nearby 2-tuples. Here, the approximation, both in time and in frequency, is taken into account, as illustrated in Figure 16, which presents a graphical representation of a context for a state calculation. The cumulative frequency tables are then used by the arithmetic encoder (encoder or decoder) to generate a variable length binary code.
[0213] However, it should be noted that a different context size can be chosen. For example, a smaller or larger number of tuples, which are in a tuple environment for decoding, can be used for context determination. Also, a tuple can comprise a smaller or larger number of spectral values. Alternatively, individual spectral values can be used to obtain the context, instead of tuples.
[0214] The arithmetic encoder produces a binary code for a given set of symbols and their respective probabilities. The binary code is generated by mapping a probability range, where the set of symbols is located, in a password. 10.2 SPECTRAL QUIET CODING DEFINITIONS
[0215] Regarding the definitions of variables, constants and so on, reference is made to Figure 18, which presents a definition legend. 10.3 DECODING PROCESS
[0216] The quantified spectral coefficients "x ac dec []" are silently decoded, starting from the lowest frequency coefficient and progressing to the highest frequency coefficient. They are decoded, for example, by groups of two successive coefficients a and b grouped in a so-called 2-tuple (a, b).
[0217] The decoded coefficients x_ac_dec [] for a frequency domain mode (as described above) are then stored in an order "x_ac_quant [g] [win] [sfb] [bin]". The transmission order of the passwords for silent encryption is such that when they are decoded in the received order and stored in the order, the box is the fastest increment index and g is the slowest increment index. Within a password, the decryption order is a and then b.
[0218] Optionally, the coefficients for a transform-encoded excitation mode can also be evaluated. Although the above examples are only related to frequency domain audio coding and frequency domain audio decoding, the concepts disclosed here can, in fact, be used for audio encoders and audio decoders operating in the encoded excitation domain by transformed. The decoded coefficients x_ac_dec [] for transform encoded excitation (TCX) are stored directly in an order x_tcx__invquant [win] [bin], and the order of transmission of the silent coding passwords is so that when they are decoded in the received order and stored in order, box is the fastest increment rate and the catch is the slowest increment rate. Within a password, the decryption order is a and then b.
[0219] First, the (optional) "arith_reset_flag" flag determines whether the context should be readjusted (or can be readjusted). If the signal is REAL, an initialization is carried out.
[0220] The decoding process starts with an initialization phase, where the context element vector q is updated by copying and mapping the context elements of the previous structure stored in the orders (or sub-orders) q [l] [] in q [0] []. the context elements within q are stored, for example, in 4 bits by 2-tuples. For details regarding initialization, reference is made to the algorithm, a representation of the pseudoprogram code of which is shown in Figure 19.
[0221] Subsequent to the initialization, which can be performed according to the algorithm in Figure 19, the scaling of the context frequency, which was discussed above, can be performed. For example, the order (or suborder) q [0] [] can be considered a preliminary context memory structure 432 (or it can be equivalent to the order self-> base.m_qbuf [] [], except for details regarding the dimensions and in relation to the integers and v). In addition, the context scaled by frequency can be stored back to the order q [0] [] (or to the order "self-> base.m_qbuf [] []"). Alternatively, however, or in addition, the contents of the order (or suborder) q [1] [] can be scaled by frequency by the 438 equipment.
[0222] In short, the silent decoder produces 2-tuples of unspecified quantified spectral coefficients. First (or, typically, after frequency scaling), the state c of the context is calculated based on the previously decoded spectral coefficients surrounding the 2-tuple to be decoded. Therefore, the state is incrementally updated using the context state of the last decoded 2-tuple considering only the two new 2-tuples. The state is encoded, for example, in 17 bits and is returned by the function "arith_get_contexto []", a pseudoprogram code representation of which is shown in Figure 20.
[0223] The context state c, which is obtained as the return value of the function "arith_get_contexto []" determines the cumulative frequency table used to decode the plane by the 2 most significant bits m. The mapping of c to the corresponding index of the cumulative frequency table pki is performed by the function "arith_get_pk []", a pseudoprogram code representation of which is shown in Figure 21.
[0224] The m value is decoded using the "arith_decode []" function called with the cumulative frequency table, "arith_cf_m [ki] []", where pki corresponds to the index returned by the "arith_get_pk []" function. The arithmetic encoder is an integer implementation using a scaled indicator generation method. Pseudocode C, according to Figure 22, describes the algorithm used.
[0225] When the decoded value m is the escape symbol "ARITH_ESCAPE", the variables "lev" and "esc_nb" are incremented by one and the other value m is decoded. In this case, the function "get_pk []" is called again with the value c & esc_nb << 17 as an input ratio, where esc_nb is the number of escape symbols previously decoded for the same 2-tuples and linked to 7.
[0226] Since the value m is not the escape symbol "ARITH_ESCAPE", the decoder checks whether the successive m forms an "ARITH_STOP" symbol. If the condition (esc_nb> 0 && m == 0) is real, "ARITH_STOP" is detected and the decoding process is ended. The decoder jumps directly to the signal decoding described later. The condition means that the rest of the structure is made up of zero values.
[0227] If the symbol "ARITH_STOP" is not found, the remaining bit planes are then decoded if one exists for the present 2-tuple. The remaining bit planes are decoded from the most significant to the least significant level by calling the "arith_decode []" function several times. The decoded bit planes r allow you to refine the previously decoded values a, b according to an algorithm, whose pseudoprogram code is shown in Figure 23.
[0228] At this point, the unsigned value of the 2-tuple (a, b) is completely decoded. Except in the order "x_ac_dec []" in which the spectral coefficients are found, as shown in the pseudoprogram code of Figure 24.
[0229] The context q is also updated to the next 2-tuple. It should be noted that this context update can also be performed for the last 2-tuple. The context update is performed by the "artih_update_contexto []" function, whose pseudoprogram code is shown in Figure 25.
[0230] The next 2-tuple of the structure is then decoded by incrementing i by one and redoing the same process described above. In particular, frequency scaling of the context can be performed, and the process described above can be restarted from the function "arith_get_contexto []" subsequently. When lg / 2 2-tuples are decoded within the structure or when the stop symbol "ARITH_STOP" occurs, the spectral amplitude decoding process ends and the decoding of the signals begins.
[0231] Once all non-signaled quantized spectral coefficients are decoded, the corresponding signal is added. For each non-zero quantified value of "x_ac_dec", a bit is read. If the bit read is equal to one, the quantized value is positive, nothing is done and the signaled value is equal to the unsigned signaled value previously. Otherwise, the decoded coefficient is negative, and the complement of two is obtained from the unsigned value. Signal bits are read from low to high frequencies.
[0232] Decoding is ended by calling the "arith_finish []" function, whose pseudoprogram code is shown in Figure 26. The remaining spectral coefficients are set to zero. The respective context states are updated accordingly.
[0233] To summarize the above, a context-based (or context-dependent) decoding of spectral values is performed, where individual spectral values can be decoded or where spectral values can be decoded by tuples (as shown above). The context can be scaled by frequency, as discussed here, in order to obtain a good coding / decoding performance in the case of a temporal variation of the fundamental frequency (or, in an equivalent way, of the timbre). 11. AUDIO FLOW ACCORDING TO FIGURES 27a-27f
[0234] In the following, an audio stream will be described which comprises a coded representation of one or more audio signal channels and one or more time warping contours. The audio stream described below can, for example, carry the encoded audio signal representation 112 or the encoded audio signal representation 152.
[0235] Figure 27a shows a graphical representation of a so-called "USAC_raw_data_block" data flow element, which may comprise a signal channel element (SCE), a channel pair element (CPE) or a combination of one or more elements of single channel and / or one or more channel pair elements.
[0236] The "USAC_raw_data_block" can typically comprise a block of encoded audio data, while additional time-deformation contour information can be provided in a separate data stream element. However, it is naturally possible to encode some time warp contour data in "USAC_raw_data_block".
[0237] As can be seen from Figure 27b, a single channel element typically comprises a stream of frequency domain channels ("fd_channel_stream"), which will be explained in detail with reference to Figure 27d.
[0238] As can be seen from Figure 27c, a channel pair element ("channel_pair_element") typically comprises a plurality of frequency domain channel streams. Also, the channel pair element may comprise time warp information, such as, for example, a time warp activation signal ("tw_MDCT"), which can be transmitted in a configuration data stream element or in the "USAC_raw_data_block", and which determines whether time warping information is included in the channel pair element. For example, if the "tw_MDCT" signal indicates that the time warp is active, the channel pair element may comprise a signal ("common_tw"), which indicates whether there is a common time warp for the audio channels of the channel pair element. If said signaling ("common_tw") indicates that there is a common time warp for multiple audio channels, then a common time warp information ("tw_data") is included in the separate channel pair element, for example frequency domain channel streams.
[0239] Referring now to Figure 27d, the frequency domain channel flow is described. As can be seen from Figure 27d, the frequency domain channel stream, for example, comprises global gain information. Also, the frequency domain channel stream comprises time warp data, if time warp is active ("tw_MDCT" signaling is active) and if there is no common time warp information for multiple audio signal channels ( "common_tw" flag is inactive).
[0240] In addition, a frequency domain channel stream also comprises scale factor data ("scale_factor_data") and encoded spectral data (for example, arithmetically encoded spectral data "ac_spectral_data").
[0241] Referring now to Figure 27e, the syntax of the time strain data is briefly discussed. The time warp data can, for example, optionally comprise a signal (for example, "tw_data_present" or "active_timbre_data") indicating whether time warp data is present. If the time warp data is present (i.e., the time warp contour is not flat), the time warp data can comprise the sequence of a plurality of encoded time warp ratio values (e.g. "tw_ratio [i]" or "timbre Idx [i]"), which can, for example, be coded according to a sample rate dependent codebook table, as described above.
[0242] Thus, the time warp data may comprise a signal indicating that there is no time warp data available, which can be adjusted by an audio signal encoder, if the time warp contour is constant (warp warp ratios). time are approximately equal to 1,000). Conversely, if the time warp contour is varying, the proportions between subsequent time warp contour nodes can be encoded using codebook indexes, constituting the information "tw_ratio".
[0243] Figure 27f presents a graphical representation of the syntax of the arithmetically encoded spectral data "ac_spectral_data ()". The arithmetically encoded spectral data is encoded depending on the status of an independence signal (here: "indepFlag"), which indicates, if active, that the arithmetically encoded data is independent of the arithmetically encoded data from an earlier structure. If the "indepFlag" independence flag is active, an arithmetic readjust flag "arith_reset_flag" is set to be active. Otherwise, the value of the arithmetic readjustment signal is determined by a bit in the arithmetically encoded spectral data.
[0244] In addition, the arithmetically encoded spectral data block "ac_spectral_data ()" comprises one or more arithmetically encoded data units, in which the number of arithmetically encoded data units "arith_data ()" is dependent on a number of blocks (or windows) in the current structure. In a long block mode, there is only one window per audio structure. However, in a short block mode, there may be, for example, eight windows per audio structure. Each arithmetically encoded spectral data unit "arith_data" comprises a set of spectral coefficients, which can serve as the input for a time domain to frequency domain transformation, which can be performed, for example, by the 180e inverse transform.
[0245] The number of spectral coefficients per unit of the arithmetically encoded data "arith_data" can, for example, be independent of the sampling frequency, but it can be dependent on the block length mode (short block mode "EIGHT_SHORT_SEQUENCE" or long block mode " ONLY_LONG_SEQUENCE "). 12. CONCLUSIONS
[0246] To summarize the above, improvements in the context of the time-warped modified discrete cosine transform were discussed. The invention described here is in the context of a time-warped modified cosine discrete transform encoder (see, for example, references [1] and [2]) and comprises methods for improved performance of a deformed MDCT transform encoder . An implementation of this time-warped modified discrete cosine transform encoder is carried out in the ongoing MPEG USAC audio coding standardization work (see, for example, reference [3]). Details on the TW-MDCT implementation used can be found, for example, in reference [4].
[0247] However, improvements to the concepts mentioned are suggested here. 13. IMPLEMENTATION ALTERNATIVES
[0248] Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or an aspect of a method step. Similarly, the aspects described in the context of a method step also represent a description of a corresponding block, item or aspect of a corresponding device. Some or all steps of the method can be performed by (or using) a hardware device, such as, for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, some or more of the most important steps of the steps method can be performed by this device.
[0249] The inventive encoded audio signal can be stored on a digital storage medium or it can be transmitted on a transmission medium, such as a wireless transmission medium or a wired transmission medium, such as the Internet.
[0250] Depending on certain implementation requirements, the realizations of the invention can be implemented in hardware or in software. The implementation can be carried out using a digital storage medium, for example, a floppy disk, a DVD, Blu-Ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having control signals electronically readable stored in it, that cooperate (or are able to cooperate) with a programmable computer system so that the respective method is carried out. Therefore, the digital storage medium can be computer readable.
[0251] Some embodiments, according to the invention, comprise a data loader having electronically readable control signals, which are capable of cooperating a programmable computer system, so that one of the methods described herein is performed.
[0252] In general, the embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operated to perform one of the methods when the computer program product runs on a computer. The program code can, for example, be stored in a machine-readable loader.
[0253] Other achievements include the computer program to perform one of the methods described here, stored in a machine-readable charger.
[0254] In other words, an embodiment of the inventive method is, therefore, a computer program having a program code to perform one of the methods described herein, when the computer program runs on a computer.
[0255] A further embodiment of the inventive methods is, therefore, a data loader (either a digital storage medium or a computer-readable medium) comprising, recorded therein, the computer program for carrying out one of the methods described herein. The data loader, the digital storage medium or the recorded medium are typically tangible and / or non-temporary.
[0256] A further realization of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program to carry out one of the written methods. The data stream or signal sequence can, for example, be configured to be transferred via a data communication connection, for example, via the Internet.
[0257] An additional embodiment comprises a processing means, for example, a computer, or a programmable logic device, configured or adapted to carry out one of the methods described herein.
[0258] A further embodiment comprises a computer having the computer program installed on it to carry out one of the methods described herein.
[0259] A further embodiment according to the invention comprises an apparatus or system configured to transfer (for example, electronically or optically) a computer program to perform one of the methods described herein to a receiver. The receiver can, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.
[0260] In some embodiments, a programmable logic device (for example, a programmable field matrix gate) can be used to perform some or all of the functionality of the methods described here. In some embodiments, a programmable field matrix port can cooperate with a microprocessor in order to perform one of the methods described here. In general, the methods are preferably performed by any hardware device.
[0261] The embodiments described above are merely illustrative for the principles of the present invention. It is understood that modifications and variations of the provisions and details described herein will be apparent to those skilled in the art. Therefore, it is intended to be limited only by the scope of the impending patent claims and not by the specific details presented by way of description and explanation of the achievements here.

权利要求:
Claims (16)
[0001]
AUDIO SIGNAL DECODER (150; 240) TO PROVIDE A DECODED AUDIO SIGNAL REPRESENTATION (154) BASED ON AN ENCODED AUDIO SIGNAL REPRESENTATION (152), characterized by comprising a coded spectrum representation (ac spectral data [] ) and encoded time warp information (tw data []), the audio signal decoder comprising: a context-based spectral value decoder (160) configured to decode a password (acod m) describing one or more spectral values or at least a part (m) of a numerical representation of one or more spectral values depending on a context state, to obtain decoded spectral values (162, 297, x ac dec []); a context state determiner (170; 400) configured to determine a current context state (164, c) depending on one or more previously decoded spectral values (162, 297); a time domain to time deformation frequency domain converter (180) configured to provide a time-deformed time domain representation (182) of a given audio structure based on a set of decoded spectral values (162, 297) associated with the given audio structure and provided by the spectral value decoder based on context and depending on the time deformation information; wherein the contextual state determiner (170; 400) is configured to adapt the contextual state determination to a change in a fundamental frequency between subsequent audio structures.
[0002]
AUDIO SIGNAL DECODER, according to claim 1, characterized in that the time deformation information (tw data) describes a variation (prel) of a timbre over time; and where the context state determiner (170; 400) is configured to derive frequency extension information (s; m ContextUpdateRatio) from the time deformation information (tw data); and where the context state determiner is configured to extend or compress a previous context (432, q [0] [], 450) associated with the previous audio structure along the frequency axis depending on the frequency extension information (s, m ContextUpdateRatio), to obtain an adapted context (440, q [0] [], 452) for decoding based on the context of one or more spectral values of a current audio structure.
[0003]
AUDIO SIGNAL DECODER, according to claim 2, characterized in that the context state determiner (170, 400) is configured to derive a first average frequency information (frel, mean, k-1) on a first structure of audio of the time deformation information (tw data, prel, warp contour []) and to derive a second average frequency information (frel, mean, k) on a second audio structure after the first audio structure of the deformation information of time; and where the context state determiner is configured to compute a ratio between the second average frequency information (frel, mean, k) over the second audio structure and the first average frequency information (frel, mean, k-1 ) on the first audio structure in order to determine the frequency extension information (s, m ContextUpdateRatio).
[0004]
AUDIO SIGNAL DECODER, according to claim 2, characterized in that the context state determiner (170; 400) is configured to determine a first average time deformation contour information (prel, mean, k-1) about a first audio structure of the time warp information (tw data, prel, warp contour []), and where the context state determiner is configured to derive a second time warp contour information (prel, mean , k) on a second audio structure after the first audio structure of the time warp information (252, tw data, prel, warp contour []), and where the context state determiner is configured to compute a ratio between the first medium-time deformation contour information (prel, mean, k-1) on the first audio structure and the second medium-time deformation contour information (prel, mean, k) on the second audio structure in order to determine ai frequency extension information (s, m ContextUpdateRatio).
[0005]
AUDIO SIGNAL DECODER according to claim 3 or claim 4, characterized in that the context state determiner (170, 400) is configured to derive the first and second average frequency information or the first and second information from average time warp contour from a common time warp contour [] that extends over a plurality of consecutive audio structures.
[0006]
AUDIO SIGNAL DECODER according to claim 3, claim 4 or claim 5, wherein the audio signal decoder is characterized by comprising a time warp calculator (250) configured to calculate a warp contour information tempo (prel [], warp contour [], 258) that describes a temporal evolution of a relative timbre over a plurality of consecutive audio structures based on time deformation information (tw data, 252), and in which the context state determiner (170, 400) is configured to use the time deformation contour information to derive the frequency extension information.
[0007]
AUDIO SIGNAL DECODER, according to claim 6, in which the audio signal decoder is characterized by comprising a re-sampling position calculator (180l), in which the re-sampling position calculator (180l) is configured to calculate resampling positions for use by the time warp resampler (180i) based on time warp contour information (prel [], warp contour [], 258), so that a time variation of the new warp positions sampling is determined by the time deformation contour information.
[0008]
AUDIO SIGNAL DECODER according to any one of claims 1 to 7, characterized in that the context state determiner (170, 400) is configured to derive a current numerical context value (164, c), which describes the state of context, depending on a plurality of spectral values decoded previously and to select a mapping standard (cum freq []) that describes a mapping of a code value (acod m) to a symbol code (symbol) that represents a or more spectral values or a part (m) of a numerical representation of one or more spectral values, depending on the current context numerical value, in which the context-based spectral value decoder (160) is configured to decode the value code (acod m) describing one or more spectral values or at least part (m) of a numerical representation of one or more spectral values, using the mapping standard (cum freq []) selected by the determ context state generator.
[0009]
AUDIO SIGNAL DECODER according to claim 8, characterized in that the context state determiner (170, 400) is configured to adjust and update a preliminary context memory structure (432, m qbuf), so that the preliminary context memory structure entries describe one or more spectral values (162, 297) of a first audio structure, where the input indices of the preliminary context memory structure entries are indicative of a frequency box or a set of adjacent frequency boxes of the time domain to frequency domain converter (180e) to which the respective inputs are associated; where the context state determiner is configured to obtain a frequency-scaled context memory structure (440; m qbuf) for decoding a second audio structure after the first audio structure based on the context memory structure preliminary, so that a particular entry (450a, 450c, self-> base.m qbuf [nWarpTupleIdx]) or a subentry (self-> base.m qbuf [nWarpTupleIdx] .a) of the preliminary context memory structure having a first frequency index (i1 + 1, i2 + 2, nWarpTupleIdx) is mapped to a corresponding entry (452a, 452c, self-> base.m qbuf [nLinTupleIdx]) or subentry (self-> base.m qbuf [nLinTupleIdx] .a) of the context memory structure scaled by frequency (440, m qbuf, 452) having a second frequency index (i1, i2-1, nLinTupleIdx), in which the second frequency index is associated with a frequency box or a set of adjacent frequency boxes other than the time domain converter for the frequency domain (180e) than the first frequency index.
[0010]
AUDIO SIGNAL DECODER according to claim 9, characterized in that the context state determiner (170, 400) is configured to derive a context state value (164,420) that describes the current context state for the encoding of a password (acod_m) describing one or more spectral values of the second audio structure or at least part (m) of a numerical representation of one or more spectral values of a second audio structure, having associated a third frequency index ( i1) using the frequency scaled context memory structure values (440, m qbuf, 452), frequency indices (i1-1, i1, i1 + 1) whose frequency scaled context memory structure values are in a predetermined relationship with the third frequency index (i1), where the third frequency index (i1) designates a frequency box or set of adjacent frequency boxes from the time domain to domain d converter and frequency (180e) to which one or more spectral values of the second audio structure to be decoded using the current context state are associated.
[0011]
AUDIO SIGNAL DECODER according to claim 9 or claim 10, characterized in that the context state determiner (170; 400) is configured to adjust each of a plurality of inputs (452a, 452c, self-> base. m qbuf [nLinTupleIdx]) of the context memory structure scaled by frequency (440,452, m qbuf) having a corresponding target frequency index (i1, i2-1, nLinTupleIdx) at a value of a corresponding input (450a, 450c, self -> base.m qbuf [nWarpTupleIdx]) of the preliminary context memory structure (432,450, m qbuf) having a corresponding source frequency index (i1 + 1, i2 + 2, nWarpTupleIdx), where the context state determiner is configured to determine corresponding frequency indices (i1, i1 + 1; i2-1, i2 + 2; nLinTupleIdx, nWarpTupleIdx) of a frequency-scaled context memory structure entry and a corresponding entry of the preliminary context memory structure, so that a ratio between said corresponding frequency indices (nLinTupleIdx, nWarpTupleIdx) is determined by changing the fundamental frequency between a current audio structure, to which the inputs of the memory structure of preliminary context are associated, and a subsequent audio structure, whose decoding context is determined by the frequency-scaled context memory inputs.
[0012]
AUDIO SIGNAL DECODER according to claim 9 or claim 10, characterized in that the context state determiner (170, 400) is configured to adjust the preliminary context memory structure (432, m qbuf, 450) in a way that each of a plurality of entries (450a, 450c, self-> base.m qbuf [nWarpTupleIdx]) of the preliminary context memory structure is based on a plurality of spectral values (a, b, c, d) of a first audio structure, in which the input indices (i1 + 1, i2 + 2, nWarpTupleIdx) of the preliminary context memory structure inputs (432,450, m qbuf) are indicative of a set of adjacent frequency converter boxes time domain for frequency domain (180e) to which the respective entries are associated; where the context state determiner is configured to extract individual context values per preliminary frequency boxes (lineReorderBuf [(curTuple-1) * 4 + 0], ..., lineReorderBuf [(curTuple-1) * 4 + 3 ]) having individual frequency box indexes associated with the entries (self-> base.m qbuf [curTuple [] []) of the preliminary context memory structure; where the context state determiner is configured to obtain individual context values per frequency scaled frequency (lineTmpBuf [linLineIdx]) having associated individual frequency box indices (linLineIdx), so that a given individual context value per preliminary frequency box (lineReorderBuf [warpLineIdx]) having a first frequency box index (warpLineIdx) is mapped into an individual context value per frequency box scaled by corresponding frequency (lineTmpBuf [linLineIdx]) having a second box index frequency (linLineIdx), so that an individual frequency box mapping of the individual context value per preliminary frequency box is obtained; and where the contextual state determiner is configured to combine a plurality of individual context values per frequency box scaled by frequency (lineTmpBuf [(curTuple-1) * 4 + 0, ..., lineTmpBuf [(curTuple-1) * 4 + 3] in a combined entry (self-> base.m qbuf [curTuple] []) of the context memory structure scaled by frequency.
[0013]
AUDIO SIGNAL ENCODER (100; 200) TO PROVIDE AN ENCODED REPRESENTATION (112) OF AN INPUT AUDIO SIGNAL (110), characterized by comprising an encoded spectrum representation (132) and an encoded time deformation information (226 ), the audio signal encoder comprising: a frequency domain representation provider (120) configured to provide a frequency domain representation (124) representing a time-warped version of the input audio signal, time-warped according to the time warp information ( 122); a context-based spectral value encoder (130) configured to provide a password (acod m) describing one or more spectral values of the frequency domain representation (12 4) or at least a part (m) of a numerical representation one or more spectral values of the frequency domain representation (124), depending on a context state (134), to obtain encoded spectral values (acod_m) of the encoded spectrum representation (132); and a context state determiner (140) configured to determine a current context state (134) depending on one or more previously coded spectral values, wherein the context state determiner (140) is configured to adapt the state determination of context to a change in a fundamental frequency between subsequent audio structures.
[0014]
AUDIO SIGNAL ENCODER according to claim 13, characterized in that the context state determiner is configured to derive a current numerical context value (134, c) depending on a plurality of previously encoded spectral values and to select one mapping standard that describes a mapping of one or more spectral values or a part (m) of a numerical representation of one or more spectral values, into a code value (acod m) depending on the current numerical context value, where the context-based spectral value encoder is configured to provide the code value that describes one or more spectral values or at least part of a numerical representation of one or more spectral values, using the mapping standard selected by the determiner of context state.
[0015]
METHOD FOR PROVIDING A DECODED AUDIO SIGNAL REPRESENTATION (15 4) BASED ON A CODED AUDIO SIGNAL REPRESENTATION (152), characterized by comprising an encoded spectrum representation (ac spectral data []) and time deformation information encoded (tw data []), the method comprising: decoding a password (acod m) describing one or more spectral values or at least part (m) of a numerical representation of one or more spectral values depending on a context state, to obtain decoded spectral values (162, 297, x ac dec []); determining a current context state (164, c) depending on one or more previously decoded spectral values (162, 297); the provision of a time-deformed time domain representation (182) of a given audio structure based on a set of decoded spectral values (162, 297) associated with the given audio structure and provided by the spectral value decoder based on in context and depending on time deformation information; wherein the determination of the state of context is adapted to a change in a fundamental frequency between subsequent audio structures.
[0016]
METHOD FOR PROVIDING AN ENCODED REPRESENTATION (112) OF AN AUDIO INPUT SIGNAL (110), characterized by comprising an encoded spectrum representation (132) and an encoded time deformation information (226), the method comprising: providing a frequency domain representation (124) representing a time-warped version of the input audio signal, time-warped according to the time warp information (122); the provision of a password (acod m) describing one or more spectral values of the frequency domain representation (12 4) or at least part (m) of a numerical representation of one or more spectral values of the frequency domain representation (124), depending on a context state (134), to obtain encoded spectral values (acod_m) of the encoded spectrum representation (132); and determining a current context state (134) depending on one or more previously coded spectral values, wherein the determination of the state of context is adapted to a change in a fundamental frequency between the subsequent audio structures.

类似技术:

公开号 | 公开日 | 专利标题

BR112012022744B1|2021-02-17|audio signal decoder, audio signal encoder, method for decoding an audio signal, method for encoding an audio signal and computer program using a timbre-dependent adaptation of a coding context

BR112012017256B1|2021-08-31|Audio encoder, audio decoder, encoding method and audio information and method of decoding an audio information using a hash table describing both significant state values and range boundaries

US9043216B2|2015-05-26|Audio signal decoder, time warp contour data provider, method and computer program

CN111009249B|2021-06-04|Encoder/decoder, encoding/decoding method, and non-transitory storage medium

PT2491553T|2017-01-20|Audio encoder, audio decoder, method for encoding an audio information, method for decoding an audio information and computer program using an iterative interval size reduction

JP6979048B2|2021-12-08|Low complexity tonality adaptive audio signal quantization

BRPI0906300B1|2021-11-09|AUDIO SIGNAL DECODER, TIME DISTORTION CONTOUR DATA PROVIDER AND METHOD

BR112015025009B1|2021-12-21|QUANTIZATION AND REVERSE QUANTIZATION UNITS, ENCODER AND DECODER, METHODS FOR QUANTIZING AND DEQUANTIZING

同族专利:

公开号 | 公开日

AU2011226143B2|2014-08-28|

WO2011110594A1|2011-09-15|

JP5625076B2|2014-11-12|

ES2458354T3|2014-05-05|

US9129597B2|2015-09-08|

AR080396A1|2012-04-04|

PL2539893T3|2014-09-30|

KR101445296B1|2014-09-29|

RU2586848C2|2016-06-10|

ES2461183T3|2014-05-19|

TW201203224A|2012-01-16|

CN102884572B|2015-06-17|

CN102884572A|2013-01-16|

TWI455113B|2014-10-01|

AU2011226140B2|2014-08-14|

TW201207846A|2012-02-16|

JP2013521540A|2013-06-10|

BR112012022741A2|2020-11-24|

JP2013522658A|2013-06-13|

AU2011226143B9|2015-03-19|

RU2012143323A|2014-04-20|

TWI441170B|2014-06-11|

RU2012143340A|2014-04-20|

EP2532001B1|2014-04-02|

CN102884573B|2014-09-10|

CA2792500C|2016-05-03|

MX2012010439A|2013-04-29|

KR101445294B1|2014-09-29|

HK1179743A1|2013-10-04|

RU2607264C2|2017-01-10|

JP5456914B2|2014-04-02|

MX2012010469A|2012-12-10|

CA2792500A1|2011-09-15|

AU2011226140A1|2012-10-18|

EP2539893B1|2014-04-02|

US9524726B2|2016-12-20|

US20130073296A1|2013-03-21|

WO2011110591A1|2011-09-15|

PL2532001T3|2014-09-30|

AU2011226143A1|2012-10-25|

CA2792504C|2016-05-31|

US20130117015A1|2013-05-09|

HK1181540A1|2013-11-08|

BR112012022744A2|2017-12-12|

KR20120128156A|2012-11-26|

EP2539893A1|2013-01-02|

CN102884573A|2013-01-16|

EP2532001A1|2012-12-12|

AR084465A1|2013-05-22|

CA2792504A1|2011-09-15|

KR20130018761A|2013-02-25|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US7272556B1|1998-09-23|2007-09-18|Lucent Technologies Inc.|Scalable and embedded codec for speech and audio signals|

JP4196235B2|1999-01-19|2008-12-17|ソニー株式会社|Audio data processing device|

DE60018246T2|1999-05-26|2006-05-04|Koninklijke Philips Electronics N.V.|SYSTEM FOR TRANSMITTING AN AUDIO SIGNAL|

US6581032B1|1999-09-22|2003-06-17|Conexant Systems, Inc.|Bitstream protocol for transmission of encoded voice signals|

CA2365203A1|2001-12-14|2003-06-14|Voiceage Corporation|A signal modification method for efficient coding of speech signals|

US20040098255A1|2002-11-14|2004-05-20|France Telecom|Generalized analysis-by-synthesis speech coding method, and coder implementing such method|

US7394833B2|2003-02-11|2008-07-01|Nokia Corporation|Method and apparatus for reducing synchronization delay in packet switched voice terminals using speech decoder modification|

JP4364544B2|2003-04-09|2009-11-18|株式会社神戸製鋼所|Audio signal processing apparatus and method|

CN101171626B|2005-03-11|2012-03-21|高通股份有限公司|Time warping frames inside the vocoder by modifying the residual|

US7720677B2|2005-11-03|2010-05-18|Coding Technologies Ab|Time warped modified transform coding of audio signals|

EP2054879B1|2006-08-15|2010-01-20|Broadcom Corporation|Re-phasing of decoder states after packet loss|

CN101366080B|2006-08-15|2011-10-19|美国博通公司|Method and system for updating state of demoder|

US8239190B2|2006-08-22|2012-08-07|Qualcomm Incorporated|Time-warping frames of wideband vocoder|

US9653088B2|2007-06-13|2017-05-16|Qualcomm Incorporated|Systems, methods, and apparatus for signal encoding using pitch-regularizing and non-pitch-regularizing coding|

EP2015293A1|2007-06-14|2009-01-14|Deutsche Thomson OHG|Method and apparatus for encoding and decoding an audio signal using adaptively switched temporal resolution in the spectral domain|

EP2107556A1|2008-04-04|2009-10-07|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Audio transform coding using pitch correction|

KR101456641B1|2008-07-11|2014-11-04|프라운호퍼 게젤샤프트 쭈르 푀르데룽 데어 안겐반텐 포르슝 에. 베.|audio encoder and audio decoder|

MY154452A|2008-07-11|2015-06-15|Fraunhofer Ges Forschung|An apparatus and a method for decoding an encoded audio signal|

PL2311033T3|2008-07-11|2012-05-31|Fraunhofer Ges Forschung|Providing a time warp activation signal and encoding an audio signal therewith|

US8600737B2|2010-06-01|2013-12-03|Qualcomm Incorporated|Systems, methods, apparatus, and computer program products for wideband speech coding|EP2083418A1|2008-01-24|2009-07-29|Deutsche Thomson OHG|Method and Apparatus for determining and using the sampling frequency for decoding watermark information embedded in a received signal sampled with an original sampling frequency at encoder side|

US8924222B2|2010-07-30|2014-12-30|Qualcomm Incorporated|Systems, methods, apparatus, and computer-readable media for coding of harmonic signals|

US9208792B2|2010-08-17|2015-12-08|Qualcomm Incorporated|Systems, methods, apparatus, and computer-readable media for noise injection|

CN103035249B|2012-11-14|2015-04-08|北京理工大学|Audio arithmetic coding method based on time-frequency plane context|

US9466305B2|2013-05-29|2016-10-11|Qualcomm Incorporated|Performing positional analysis to code spherical harmonic coefficients|

US20140355769A1|2013-05-29|2014-12-04|Qualcomm Incorporated|Energy preservation for decomposed representations of a sound field|

PL3011692T3|2013-06-21|2017-11-30|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Jitter buffer control, audio decoder, method and computer program|

MX357135B|2013-10-18|2018-06-27|Fraunhofer Ges Forschung|Coding of spectral coefficients of a spectrum of an audio signal.|

WO2015057135A1|2013-10-18|2015-04-23|Telefonaktiebolaget L M Ericsson |Coding and decoding of spectral peak positions|

FR3015754A1|2013-12-20|2015-06-26|Orange|RE-SAMPLING A CADENCE AUDIO SIGNAL AT A VARIABLE SAMPLING FREQUENCY ACCORDING TO THE FRAME|

US9502045B2|2014-01-30|2016-11-22|Qualcomm Incorporated|Coding independent frames of ambient higher-order ambisonic coefficients|

US9922656B2|2014-01-30|2018-03-20|Qualcomm Incorporated|Transitioning of ambient higher-order ambisonic coefficients|

ES2741506T3|2014-03-14|2020-02-11|Ericsson Telefon Ab L M|Audio coding method and apparatus|

US9620137B2|2014-05-16|2017-04-11|Qualcomm Incorporated|Determining between scalar and vector quantization in higher order ambisonic coefficients|

US10770087B2|2014-05-16|2020-09-08|Qualcomm Incorporated|Selecting codebooks for coding vectors decomposed from higher-order ambisonic audio signals|

US9852737B2|2014-05-16|2017-12-26|Qualcomm Incorporated|Coding vectors decomposed from higher-order ambisonics audio signals|

US9747910B2|2014-09-26|2017-08-29|Qualcomm Incorporated|Switching between predictive and non-predictive quantization techniques in a higher order ambisonicsframework|

WO2016142002A1|2015-03-09|2016-09-15|Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.|Audio encoder, audio decoder, method for encoding an audio signal and method for decoding an encoded audio signal|

CN105070292B|2015-07-10|2018-11-16|珠海市杰理科技股份有限公司|The method and system that audio file data reorders|

EP3306609A1|2016-10-04|2018-04-11|Fraunhofer Gesellschaft zur Förderung der Angewand|Apparatus and method for determining a pitch information|

JP2021500627A|2017-10-27|2021-01-07|フラウンホファーゲセルシャフトツールフェールデルンクダーアンゲヴァンテンフォルシュンクエー．ファオ．|Noise attenuation in the decoder|

法律状态:
2017-12-19| B15I| Others concerning applications: loss of priority|

2018-02-27| B12F| Appeal: other appeals|

2019-09-17| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|

2020-07-14| B07A| Technical examination (opinion): publication of technical examination (opinion) [chapter 7.1 patent gazette]|

2020-11-24| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|

2021-02-17| B16A| Patent or certificate of addition of invention granted|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 09/03/2011, OBSERVADAS AS CONDICOES LEGAIS. |

优先权:

申请号 | 申请日 | 专利标题

US31250310P| true| 2010-03-10|2010-03-10|

US61/312,503|2010-03-10|

PCT/EP2011/053541|WO2011110594A1|2010-03-10|2011-03-09|Audio signal decoder, audio signal encoder, method for decoding an audio signal, method for encoding an audio signal and computer program using a pitch-dependent adaptation of a coding context|

[返回顶部]